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CHAPTER  ONE 


INTRODUCTION 

During  the  last  decade,  considerable  attention  has  been  given 
to  the  design  of  parallel  computers  and  the  development  of  parallel  numer 
ical  procedures  for  identification,  estimation  and  control  which  are 
imp lament able  on  these  machines.  As  an  introduction  to  this  topic,  some 
background  material  is  given  in  Section  1.1.  A  brief  review  of  exist¬ 
ing  parallel  algorithms  for  identification,  estimation  and  control  is 
presented  in  Section  1.2.  The  motivation  and  significance  of  the  re¬ 
search  reported  in  this  report  are  discussed  in  Section  1.3.  Finally, 
the  objectives  and  contributions  of  this  report  are  stated  in  Section  1.1 
1.1  Background 

The  design  of  an  automatic  control  system  generally  involves 
the  selection  of  additional  components  which  usually  have  adjustable 
parameters  such  that  the  overall  system  meets  a  desired  performance 
specification.  For  example,  this  performance  specification  may  be 
formulated  in  terms  of  the  minimization  of  an  error  criteria,  settling 
time,  energy  constraint  or  it  may  simply  require  a  stable  response. 

The  performance  index  provides  a  quantitative  measure  of  system  perform¬ 
ance  and  is  chosen  to  emphasize  important  system  characteristics.  This 
type  of  quantitative  measure  is  very  important  for  parameter  idenfica- 
tion,  state  estimation,  and  for  the  design  of  optimal  and  subcptimal 


control  systems . 


The  early  vcrk  in  the  area  cf  parameter  identification  can 
be  attributed  to  Ilyquist  [l]  and  Bode  [2]  in  which  frequency  analysis 
methods  were  used  in  conjunction  with  logarithmic  frequency  diagrams 
to  fit  parametric  models  to  data. 

More  recently,  parametric  models  used  in  "modern"  control 
theory  have  been  formulated  in  terms  of  state  equations.  The  need  to 
determine  such  models  from  experimental  data  has  led  to  a  continual 
effort  to  improve  parameter  identification  and  state  estimation  pro¬ 
cedures.  Probably  the  oldest  and  most  widely  known  methods  for  perform¬ 
ing  these  tasks  include:  maximum  likelihood  techniques  [3],  Kalman 
filters  [U],  weighted  least  squares  procedures  ’5j>  and  stochastic  ap¬ 
proximation  [6].  These  sequential  methods,  however,  may  require  a 
prohibitive  amount  cf  computer  time  to  converge,  if  in  fact,  convergence 
occurs  at  all. 

Since  the  introduction  of  time  domain  methods  and  the  develop¬ 
ment  of  optimal  control  theory,  a  dramatic  change  in  the  design  of  auto¬ 
matic  control  systems  has  occurred.  Because  the  use  of  optimal  control 
theory  generally  results  in  the  need  to  solve  a  highly  nonlinear  two- 
point  boundary  value  problem  (NTP37?)  ,  much  research  has  beer,  conducted 
in  the  area  of  numerical  methods  for  solving  !TT?37?'s.  Currently,  the 
most  popular  methods  for  solving  NTPBVP's  include  iterative  techniques 
such  as  shooting  methods  LT ]  and  quasi-iir.earization  [3]  or  non-itera¬ 
tive  methods  such  as  invariant  imbedding  [9].  These  methods,  however, 
suffer  from  the  fact  that  convergence  to  the  optimal  solution  is  rather 
time  consuming  if  these  procedures  are  implemented  on  a  conventional 
computer.  One  way  to  correct  this  problem  might  be  to  design  faster 


2 


computer  systems,  cr  simply  develop  more  efficient  algorithms . 

Apparently  at  this  time,  many  felt  that  the  numerical  methods  develop¬ 
ed  to  date  vere  rather  efficient  (although  sequential  in  nature;  and 
that  the  problem  of  excessive  computation  time  could  be  most  easily 
dealt  vitfc  by  the  design  of  faster  (parallel)  computers. 

In  viev  of  the  need  for  faster  computers,  the  computer  indus¬ 
try  has  seen  a  significant  change  in  the  architecture  of  modern  computers. 
This  has  led  to  the  development  of  parallel  computers  which  are  capable 
of  performing  the  same  set  of  instructions  on  many  data  sets  simulta¬ 
neously  (see  Figure  l.l).  Basically,  a  parallel  computer  can  be  viewed 
as  a  set  of  processing  elements  (PE's),  each  of  which  has  its  own  local 
memory  (LM)  and  a  repertoire  of  arithmetic  and  logical  instructions. 

The  role  cf  the  central  processor  (C?)  is  to  coordinate  the  efforts  cf 
each  PE  while  the  LM  is  used  for  temporarily  storing  intermediate  results. 
Each  PE  is  synchronized  to  perform  the  same  instruction  at  the  same  time 
on  data  located  in  its  own  memory.  When  sin  instruction  set  has  been 
completed  by  each  processor,  the  results  are  transferred  to  global  memory 
where  the  central  processor  interprets  the  results  and  decides  whether 
to  continue  computations  cr  halt.  Note  that  if  N  processors  are  avail¬ 
able  and  calculations  are  organized  such  that  each  PE  is  being  fully 
utilized,  then  the  speed  of  computation  would  be  ‘.I  times  faster  than  the 
speed  of  a  single  PE.  It  is  clear,  however,  that  achieving  this  increase 
in  speed  requires  great  care  ir.  structuring  computational  algorithms. 

Recognizing  this  fact,  Larsen,  et  al.  [10],  .11]  were  the  first 
to  seriously  consider  using  the  parallel  computing  capabilities  of 
modem  parallel  computers  to  allow  the  implementation  of  nonlinear 


FIGURF,  1 .  L :  A  Parallel  Computer  Organization 


estimation  and  control  procedures.  Much  of  the  work  in  these  papers 


vas  primarily  concerned  with  restructuring  dynamic  programming  so  that 
many  calculations  could  be  performed  by  a  computer  with  the  facility 
for  large  scale  parallel  processing.  Unfortunately,  in  many  cases, 
the  number  of  processors  required  by  this  method  can  be  too  extensive 
to  be  very  practical. 

For  a  number  of  years  after  the  initial  efforts  of  Larsen  and 
Tse,  it  appeared  that  the  development  of  parallel  algorithms  for  non¬ 
linear  estimation  and  control  had  ceased.  This  occurrence  might  have 
been  due  to  the  many  problems  associated  with  the  Illiac  IV  (the  first 
truly  parallel  computer)  [12]. 

Recently,  however,  there  has  been  a  renewed  interest  in  parallel 
algorithms  due  to  the  successful  development  of  many  parallel  computers 
(see  Table  1.1).  Due  to  the  availability  of  these  machines,  many  new  non¬ 
linear  estimation  and  control  algorithms  have  been  proposed,  the  details 
of  which,  will  be  discussed  in  the  following  section. 

1.2  A  Survey  of  Parallelism  in  Identification , 
estimation  and  Control 

The  idea  of  structuring  estimation  and  control  algorithms  such 
that  many  operations  may  be  performed  simultaneously  has  only  recently 
been  considered.  In  fact,  this  area  is  so  new  that  at  the  present  time 
only  a  small  number  of  parallel  algorithms  exist  to  perform  such  tasks. 

To  survey  these  methods,  we  will  consider  the  topics  of  identification, 
estimation  and  control  separately  in  the  remainder  of  this  section. 
Parallel  Parameter  Identification 

In  reference  [13],  Reid  is  concerned  with  identifying  the 
parameters  of  a  linear  time  invariant  system  from  observing  noise  free 
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output  measurements.  Basically,  Reid's  method  employs  an  algebraic 
representation  of  the  parameter  sensitivity  variables  to  efficiently 
and  accurately  obtain  the  system  component  matrices  and  component  sen- 
sitivieties  vhich  are  then  used  to  form  a  linear  system  of  equations 
and  a  small  number  of  quadrature  integrals.  The  parallel  aspects  of 
this  algorithm  lie  in  the  fact  that  the  quadrature  integrals  may  be 
evaluated  simultaneously.  Also,  parallelism  can  be  exploited  in  solv¬ 
ing  the  linear  system  of  equations  by  utilizing  the  procedures  discussed 
in  references  [lbl  and  [15]. 

The  structure  of  Reid's  algorithm  also  leads  to  a  natural  set 
of  conditions  vhich  are  useful  in  determining  if  a  system  is  identifi¬ 
able.  7or  example,  for  a  system  to  be  locally  identifiable,  the  sensi¬ 
tivity  matrix  must  have  rank  p  vhere  p  is  the  number  of  unknown  parameters 

Other  results  or.  "structural  i  dent  ifi  ability ,"  "excitation  i  dent  if  lability 
and  their  "quality"  are  also  reported  by  Reid  in  references  [13]  and  Ll6  3  * 
3ecause  Reid  relies  heavily  on  linearity  and  other  properties  of  linear 
systems  in  the  development  of  his  method,  it  is  restricted  to  linear 
dynamical  systems.  This,  along  with  the  fact  that  measurement  noise  and 
process  noise  are  omitted  from  the  problem  formulation,  seriously  limits 
the  application  of  this  method. 

Parallel  State  Estimation 

The  approach  to  the  linear  state  estimation  problem  taken  in 
reference  [IT ]  is  to  develop  an  explicit  square-root  algorithm  vhich 
allows  parallel  processing  with  little  communication  between  processors. 
The  method  is  based  upon  a  modification  of  the  Kalman  filter  algorithm. 


Basically,  a n  interval  of  data  is  partitioned  into  N  subintervo1 s  and 
state  estimates  are  claculated  based  only  on  data  within  each  subinter¬ 
val.  These  calculations  are  performed  simultaneously  by  N  processors 
working  independently.  When  each  processor  completes  its  task,  some 
global  calculations  are  performed  and  the  results  are  combined  to  obtain 
an  overall  optimal  estimate  at  the  subinterval  endpoints.  At  this  point, 
the  estimates  at  the  subinterval  interior  points  may  be  updated  if 
desired. 

The  most  expensive  computations  required  by  this  procedure  is 
estimating  the  states  at  the  subinterval  endpoints.  Generally,  this 
procedure  requires  about  lU-HojS  more  computations  than  a  conventional 
Kalman  filter  but,  because  many  of  these  computations  can  be  performed 
in  parallel,  the  actual  execution  time  may  indeed  be  much  less.  As 
pointed  out  by  Kalaith  [17],  for  this  method  to  be  faster  than  a 
single  Kalman  filter,  the  system  should  be  high  order,  have  a  sparse 
system  matrix  and  the  state  estimates  must  be  desired  infrequently 
compared  to  the  number  of  daca  points.  To  help  speed  computations,  a 
square-root  doubling  formula  is  introduced  by  Kalaith  for  calculating 
the  steady-snate  covariance  matrix  of  time  invariant  systems. 

Although  Kalaith  argues  that  the  parallel  square-root  algo¬ 
rithm  can  be  more  efficient  than  a  single  Kalman  filter,  this  is  not 
verified  through  simulation.  Also,  it  should.be  noted  that  this  method 
is  only  applicable  to  state  estimation  cf  linear  systems. 

Parallel  Maximum  Likelihood  Estimation 

In  reference  [ill  Larsen  and  Tse  restructure  the  dynamic  pro¬ 
gramming  method  to  estimate  the  maximum  likelihood  trajectory  of  a 
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Parallel  State  and  Parameter  Estimation 

The  parallel  state  and  parameter  (SA ?)  estimation  algorithm  re¬ 


porter  oy  .-argecn,  et  a- .  ir.  reference 


has  been  developed 


for  discrete  linear  time  invariant  systems  whose  outputs  are  corrupted 
by  a  white  Gaussian  noise  (VGN)  process.  This  parallel  3aysian  algorithm 


employs  many  extended  Kalman  filters  (  EFK '  s  ' 


wnior.  smmuxtaneousj.y  oerform 


nonlinear  discrete-time  system.  The  approach  taxer,  by  larsen  and  Tse 
is  to  decompose  the  dynamic  programming  algorithm  into  three  parts  which 
consist  of  a  parallel  states  algorithm,  a  parallel  noises  algorithm, 
and  a  parallel  state  and  stages  algorithm. 

The  major  advantage  of  the  parallel  states  algorithm  is  that 
the  calculations  performed  by  each  processor  are  the  same.  Although 
this  is  highly  desirable,  there  are  several  "overhead"  calculations  (such 
as  binary  search  and  compare)  associated  with  the  parallel  noises  algo¬ 
rithm  which  require  moderate  communication  between  processors  ar.d 
peripheral  devices. 

Also,  the  major  shortcoming  of  this  approach  lies  in  the  fact 
that  the  parallel  states  and  stages  algorithm  can  require  a  prohibitive 
number  cf  processors.  For  example,  for  a  problem  with  1C  states  and 
100  stages,  then  10x100  *  1000  processing  elements  would  be  required. 
Although  these  processors  need  not  be  very  sophisticated,  such  a  large 
cumber  of  them  may  lead  to  reliability  and  synchronization  problems. 

Finally,  Larsen  and  Tse  do  not  discuss  the  application  cf  their 
parallel  dynamic  programming  algorithm  to  any  problem  of  practical  ir.ter- 
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the  SAP  estimation  functions .  The  integration  of  the  a  posteriori 
density  function  required  hy  this  method  is  approximated  hy  a  finite 
sum  whose  elements  may  be  computed  independently.  Note  that  this  method 
of  integration  is  a  variation  of  the  rectangle  rule,  which  is  known  to 
be  extremely  inaccurate  unless  many  grid  points  are  used.  Since  the 
number  of  parallel  filters  is  equal  to  the  number  of  grid  points,  it  is 
entirely  possible  that  a  rather  large  number  of  filters  may  be  required 
to  implement  this  procedure;  especially  if  accuracy  is  a  major  consid¬ 
eration. 

As  an  indicator  of  performance,  the  parallel  SAP  estimation 
algorithm  was  tested  by  solving  both  a  first  and  second  order  linear 
state  estimation  problem.  For  these  simple  examples,  the  simulations 
performed  in  reference  [  ig]  indicate  that  8-20  parallel  filters  could 
he  used  without  much  loss  in  accuracy  of  the  estimates.  Although  this 
is  encouraging,  it  seems  more  appropriate  to  test  this  method  on  a  low 
order,  out  highly  nonlinear,  process  in  which  both  the  sta.te  and  para¬ 
meters  of  the  process  must  be  estimated.  For  this  problem,  it  seems 
clear  that  a  trade-off  must  be  made  between  accuracy  and  the  number  of 
parallel  filters  required  for  the  procedure  to  converge. 

Parallel  Control  Algorithm 

One  of  the  first  attempts  to  use  parallelism  to  speed  up 
optimal  control  computations  was  reported  by  Larsen  and  Tse  in  refer¬ 
ence  .[10].  In  this  paper,  a  parallel  dynamic  programming  algorithm 
for  solving  both  deterministic  and  stochastic  nonlinear  control 
problems  was  developed. 
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Basically,  their  method  is  based  upon  a  decomposition  cf  the 


dynamic  programming  algorithm  into  a  parallel  states  algorithm,  a  par¬ 
allel  state  and  stages  algorithm,  a  parallel  decisions  algorithm,  a 
parallel  successive  approximations  algorithm,  a  parallel  shift  vector 
algorithm,  and  a  parallel  state  increment  dynamic  programming  algorithm. 

Larsen  and  Tse's  method  suffers  from  the  same  problems  as 
their  maximum  likelihood  method  previously  discussed,  as  veil  as  the 
fact  that  the  parallel  state  increment  dynamic  programming  algorithm 
employs  Euler's  method  to  integrate  the  right-hand  side  of  a  nonlinear 
system's  equations  of  motion.  Since  it  is  veil  known  that  Euler's 
method  is  not  very  accurate  unless  extremely  small  steps  are  taken,  it 
appears  that  this  approach  is  of  little  practical  value. 

1.3  Motivation  and  Significance 

Although  nonlinear  optimal  control  and  estimation  theory  has 
been  known  for  a  number  of  years,  the  development  of  practical  numeri¬ 
cal  methods  based  upon  this  theory  has  been  relatively  slow.  As 
pointed  out  in  the  previous  section,  the  major  problems  with  existing 
methods  have  been  a  lack  of  accuracy  and  excessive  computation  time  and 
that  the  use  of  parallelism  has  been  proposed  to  alleviate  such  problems. 

The  survey  of  existing  parallel  estimation  and  control  algo¬ 
rithms  indicates  that  there  exists  a  need  to  develop  more  efficient 


parallel  procedures  based  upon  modem  nonlinear  estimation  and  control 
theory.  This  fact  has  motivated  an  investigation  of  several  parallel 
procedures  vith  the  hope  that  the  computation  time  required  for  con¬ 
vergence  of  the  new  procedures  could  be  significantly  less  than 
existing  methods. 
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The  development  of  mere  efficient  parallel  estimation  and 
control  algorithms  is  significant  since: 

•  Parallelism  should  enable  the  design  of  nonlinear  control  systems 
without  the  need  for  approximating  the  behavior  of  highly  nonlinear 
equations  of  motion  by  a  linearized  model. 

•  Parallel  implementation  could  speed  up  computation  time  enough  to 
permit  modern  nonlinear  identification,  estimation,  and  control 
algorithms  to  be  executed  in  real  time. 

With  this  in  mind,  the  objectives  and  contributions  of  this 
thesis  will  now  be  clearly  stated. 

1 . U  objectives  and  Contributions 

In  Section  1.2  a  survey  of  seme  existing  parallel  identifica¬ 
tion,  estimation,  and  control  algorithms  and  an  evaluation  of  their 
usefulness  and  drawbacks  was  made  in  terms  of  accuracy,  speed,  pro¬ 
cessor  requirements  and  numerical  efficiency.  From  this  survey,  it  is 
clear  that  none  of  the  existing  parallel  algorithms  meet  all  of  these 
requirements .  In  view  of  the  above,  the  objectives  of  this  report  are 
to : 

•  Develop  computationally  efficient  procedures  for  the  identification, 
estimation  and  control  of  nonlinear  dynamical  systems  which  employ 

a  high  degree  of  parallelism  but  at  the  same  time  are  not  extrava¬ 
gant  in  the  utilization  of  processing  elements. 

•  Investigate  both  analytically  and  through  computer  simulation  the 
convergence  properties  of  the  newly  developed  procedures. 

< 
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r.ts  obtainable  through  utilising  the  nonlinear 


eouatior.s  of 


systems . 


vr.en  assigning  cor.trc _ ers 


•  Studv  the  robustness  of  the  oarallel  a_£oritru'.s  an; 
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values  of  certain  algorithm  parameters  sc  that  near  optimum  cerfcm- 
ance  will  result  for  a  variety  of  problems . 


•  Process  a  oarauieu  commuter  aromtecture  suataoie  for  imo_ementin, 


the  newly  developed  parallel  methods. 


The  major  contributions  which  resulted  from  conducting  this 
-esearch  are: 


Development 


a  class  of  naralle_  r arm- two  cuasi-'.ievtcr.  methods 


-  r  f.  r.  s  ?  -  r 


unconstrained  minimization. 


•  Establishment  of  a  strategy/  for  optimally  selecting  the  number  of 
subintervals  ar.d  mesh  points  associated  with  the  parallel  shoot  in. 
approach  to  solving  nonlinear  two-point  boundary  value  problems. 


•  Teve_cpmer.t  of  a  procedure  which  automatically  adjusts  one  step  size 
of  a  parallel  predictor-corrector  integration  scheme  to  maintain  a 


desired  level  of  accuracy. 


•  Demonstration  with  representative  examples  that  the  newly  developed 
parallel  algorithms  ic  indeed  perform  better  than  existing  sequential 


methods  in  terms  of  speed,  accuracy,  and  reliability. 


•  application  the  ?QN,  ?VM,  and  1M  methods  to  solving  dynamic 
c-~ i“-“a'-on  P-oblems  \ such  as  nonlinear  estimation  and  control 
:ar.  static  optimization  problems  involving 
functions . 


ore  clems '  Mathev* 
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CHAPTER  TWO 


PARALLEL  ALGORITHMS  FOR  THE  IDENTIFICATION , 

ESTIMATION  AND  CONTROL  CP  NONLINEAR  DYNAMI  CAL  SYSTEMS 

In  this  chapter,  a  collection  of  parallel  algorithms  are 
described  which  car.  be  used  to  solve  nonlinear  estimation  and  control 
problems  on  a  computer  with  the  facility  for  large  scale  parallel 
processing.  We  shall  begin  our  discussion  of  these  techniques  by 
formulating  the  nonlinear  estimation  and  control  problems  in  Sections 
2.1.1  and  2.2.1  respectively.  In  Section  2.1,  parallel  methods  for 
simultaneous  state  and  parameter  (SA?)  estimation  are  presented  while 
parallel  control  algorithms  are  -discussed  in  Section  2.2.  In  Section 
2.3,  these  methods  are  combined  so  that  the  estimation  and  control 
functions  can  be  performed  on-line  in  an  adaptive  fashion. 

It  should  be  emphasized  that  the  goal  cf  this  chapter  is  to 
develop  algorithms  which  possess  a  high  degree  of  parallelism  but  at 
the  same  time  do  not  require  ar.  excessive  number  of  processors.  This, 
along  with  the  face  that  the  parallel  algorithm  presence!  in  Sections 
2.1,  2.2,  and  2.3  should  be  capable  of  handling  the  nonlinear  process 
equations  directly,  represent  two  of  the  contributions  of  this  report  • 
Finally,  one  of  the  more  subtle  contributions  developed  in 
this  chapter  is  an  adaptive  mesh  selection  algorithm  which  optimally 
places  the  mesh  points  needed  by  the  method  of  parallel  shooting.  Note 
that  the  mesh  point  placement  is  optimal  in  the  sense  of  minimizing  the 
maximum  local  truncation  error  associated  with  integrating  differential 
equations  numerically. 


2 . 1  Parameter  Identification  ar.d  State  Estimation  Algorithms 

In  this  section,  tvo  methods  are  presented  for  simultaneously 
estimating  the  state  and  identifying  the  parameters  of  a  nonlinear 
dynamical  system.  The  first  method  is  based  upon  solving  a  nonlinear 
two-point  boundary  value  problem  (NTPBVP)  while  the  second  method 
requires  a  direct  minimization  of  a  suitably  defined  cost  function. 
3efore  the  details  of  these  methods  are  given,  a  formal  statement  of 
the  state  and  parameter  (SAP)  estimation  problem  is  in  order. 

2.1.1  Problem  Statement 

Consider  the  nonlinear  dynamical  system  and  measurement 
model  represented  by 

x(t)  *  f[x(t),t]  +  C-[x(t)  ,t]v(t)  (2.1-1) 

z(t)  =  h[x(t),t]  +  v(t)  (2.1-2) 

where 

is  an  augmented  state  vector  which  includes  the 
’unknown  parameters,  w(t)eRp  is  a  process  noise,  and  z(t)eRn 
is  the  measurement  vector  which  has  been  corrupted  by  the 
measurement  noise  v(t). 

It  is  assumed  that: 


•  The  initial  state  of  this  process  is  Gaussian  with  mean  n  and 

xo 

covariance 

S(x(t  )xT(t  )}=o 
0  oxo 


•  The  noise  processes  w(t)  and  v(t)  are  mutually  independent  zero-mean 
white  Gaussian  noise  ('w'GN)  processes  with  corresponding  covariance 
matrices 
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l{w{t)v~(  3'  }  =  Q(  t ) o  ( t-s ) 


E{ v ( t ) v* ( s ) }  =  P(t)d(t-s] 


:o  -  *’  5  -  *i 


<  t,  s  <  t 


•  Q(t)  and  R(t)  are  positive  definite  symmetric  sc  that  5  x(t.'  and 

R  1(t)  exist  V  n  e  rt  ,  t,]. 

o  t 

Let  us  define  =  (s(r)  lt^T<t  }  as  the  accumulated  noisy  state  mea¬ 
surements  up  to  and  including  time  t.  The  problem  is  to  obtain  an 
estimate  of  the  augmented  state  vector  x(t)  at  time  t  on  the  basis 
of  the  observations  represented  by  Z  .  Cur  interest  will  be 
restricted  to  the  case  when  s  >  t  in  which  case  x  i  is  referred  to 

Z  \  3 

as  a  "smoothed"  estimate  of  x(t). 

3y  defining  p[x;t  Z.J  as  the  a  posteriori  probability  that 

z 

the  state  vector  assumes  the  value  x  at  time  t  conditioned  upon 

the  measurement  data  represented  by  ,  the  maximum-a-posteriori  (MAP) 

w 

estimate  of  x  ;  (denoted  as  x^P^is  defined  by 


t  s 


p[x^-,t  K  ]  =  max  p[x;t  |z  ]  t  <  t  <  s  < 

XtrC 


(2.1-3) 


It  has  been  shown  (cf.  [ 19] ,  ( 20] )  that  the  maximization 

indicated  in  eon.  (2.1-3)  is  equivalent  to  finding  the  deterministic 

signal,  w(t)  tc[t  .t,]  which  minimizes  the  functional 
o  ± 

J  =  h||x(t  )-m  lljr-i 
0  Mx 

0 

+  hf  |z(t)-h{x(t),tj|  |2  .  +I|w(t)||21  }dt  (2.1-1*) 
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subject  to  the  dynamic  equality  constraint  given  by 


£(t)=f[x(t) ,t]+G[x(t) ,t]  v(t)  v  t  £  [t  ,  ti 


Note  that  the  SAP  estimation  problem  has  been  converted  to  a 
deterministic  optimization  problem  and  that  once  v(-c)  is  found  such  that 
eqn.  (2.1-M  is  minimized,  ve  may  integrate  eqn.  (2.1-5)  to  obtain  the 
MAP  estimate  of  x(t)  provided  x(t  )  is  known.  Although  this  approach 
to  the  SAP  estimation  problem  is  not  new,  the  use  of  parallelism  to  expe¬ 
dite  the  search  for  v(t)  as  well  as  in  integrating  eqn.  (2.1-5)  has  not 
been  considered  before.  These  ideas  are  explored  further  in  the  next 
section. 

2.1.2  Indirect  State  and  Parameter  Estimation  Algorithm 


In  this  section,  a  parallel  numerical  method  based  upon  the 
calculus  of  variations  is  presented  for  simultaneously  estimating  the 
state  and  parameters  of  a  nonlinear  dynamical  system.  Although  this 
method  can  be  used  to  solve  state  and  parameter  (SAP)  estimation 
problems  which  do  not  incorporate  process  noise  into  the  state  model, 
the  method  will  be  presented  assuming  that  a  noise  process  is  used 
to  account  for  modelling  uncertainties.  With  this  in  mind,  consider 
the  optimization  problem  defined  by  eans .  (2.1-U)  and  (2.1-5). 

To  find  #(t)  using  the  calculus  of  variations,  let  the 
Hamiltonian  be  defined  as 

H  =  h  f| | z(t)-h[x(t) ,t] | | 2  +  | |v(t) | | 2  . 

L  R-1(t )  <fX(t) 

+A"(t){f(x(t) ,t]+G[x(t),t]0(t)}  Vte[to,tf]  (2. 1.2-1) 
Then  the  necessary  conditions  for  optimality  become: 
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0  =»  v(t)  =  -Q(t)0T[x(O  ,t]X(t) 


(2. 1.2-2) 


x(t) 


+  G[x(t) ,  t]  v(t) 


4  f  [x(t),  X ( t ; ,  t] 


(2. 1.2-3) 


\(t)  = 


9h‘ 


"s'x'^)'1  ^(t)  -  h[x(t)  ,  t]} 


,  *  Y  1 


x(t)  ,  t] 


3x(t) 


X(t)  - 


3(1  (t)  G[x(  t ) ,  t]  w(t)} 
3x(t ) 


4  g[x(t) ,  Alt)  ,  t] 


(2. 1.2-1) 


The  boundary  conditions  associated  with  eons.  (2. 1. 2-2  )- 
(2. 1.2-1)  are  given  by 


A(t  ) 
o 


[x( 


-  a 


xc 


(2.1.2-5a) 


X(tf)  =  0  (2.1. 2- Jb ) 

In  principle,  the  solution  to  the  nonlinear  two-point 
boundary  value  problea  ( NTP3V? )  represented  by  ecus.  (2. 1.2-2),  (2.1.2- 
(2. 1.2-1)  and  (2. 1.2 -5)  can  be  obtained  using  ordinary  shooting  i7 j 
but,  because  computational  problems  can  arise  when  integrating 
eqns.  (  2. 1.2-3)  and  (2. 1.2-1)  forward  in  time,  the  following  parallel 
shooting  procedure,  which  is  a  modification  of  Keller's  approach  [21], 
is  recommended. 


Parallel  Shooting  Solution  of  Nonlinear  SAP  Estimation  Problems 

To  illustrate  the  procedure  eqns.  (2. 1.2-3)  and  (2. 1.2-1)  are 
concatenated  as  follows: 
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x(t)  :'[x(t) ,  X(t)  ,  t] 

7  -  =  - -  4  s[x(t)  ,  X(t),  t] 

_X(t)J  j_g[x(t),  X(t),  t] 


(2.1.2-6a) 


x<t  J  x(t  )  X(z  ) 

A  — 2-j  +  3  - i-  =  — a-  4  a 

x<0  X(t_) 


X(to 


(2.l.2-6o) 


vhere  a  is  a  2n  vector  of  known  boundary  conditions  and  the  elements 
of  the  2ax2n  matrices,  A  and  B,  are  chosen  such  that  eon.  (2.1.2-6b) 
is  satisfied.  By  defining  y(t)  =  --^1  and  partitioning  the  interval 

[tQ,tj,]  into  5  subintervals  tQ  <  <  ....  <  t^*  t^,  the  HTPBVP 

represented  byeqn.  (2. 1.2-6)  can  be  written  as 

‘  *J(!V  °  “‘W3  (2.1-2-Ta) 


(2.1.2-Ta) 


Ay0(tQ)  +  3y]J-1(tf)  -  a  j  *  0,  1,  ... ,  N-l  (2.1.2-Th) 


vhere 


- 


tzlty  tJ+1] 


otherwise 


Since  y(t)  is  required  to  be  continuous  at  the  partition  points,  in  is 


necessary  that 


*  7j(tj)  J  *  1,2,  ...»  J-l 


(2. 1.2-8) 


Combining  eqns.  (2. 1.2-7)  and  (2. 1.2-8)  results  in  the  following 


jrrrpBVP: 


i(t)  =  p(Y(t),  t) 

*  Y 


t  £  CtQ,  tf] 


(2.1.2-9a) 

(2.1.2-9b) 


Here  Y(t),  F(Y(t),  t),  Y^,  Yy  and  Y  are  2nN  vectors  defined  as: 
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Y(t)  A 


F  ( Y  ( t )  ,t)  A 


yo(tlJ 

. 

V  A 

y!(V 

y1(“2) 

Y£  £ 

• 

o 

Y  A 
r  = 

• 

• 

; 

Y  A 

yH-l(tN-l) 

yN-l^P 

-  — 

-  - 

■ 

with  P  and  0  'oeing  2nSX2n:t 

matrices  of  the 

fom 

A  0  .  .  • 

.  o" 

“  C  0  .  . 

.  .  5  ” 

0  10.. 

.  0 

-I  0  .  . 

.  .  0 

•  • 

. 

' 

0  -10. 

.  .  0 

P  = 

•  • 

• 

Q.  - 

• 

0 

. 

0 

. 

•  » 

0  •  •  •  • 

o  i_ 

0  ... 

0  -  io 

-J 

X(to) 


X(tJ 


1 

J 


Note  that  if  ve  could  find  the  constant  vector  Y£  which 


minimizes 


E  = 


lph  +  ~  Y| 


(2.1.2-10) 

subject  to  the  dynamic  constraint  (2.1.2-9a),  we  would  have  an  estimate 
of  the  unknown  states  and  parameters.  Because  the  components  of 
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(2.1.2-9a)  are  uncoupled,  they  nay  be  integrated  using  a  parallel 
integration  algorithm  by  simultaneously  shooting  over  each  subinter¬ 
val.  Because  each  subinterval  is  much  shorter  than  the  interval 

[t  ,t ,  it  becomes  less  likely  that  the  integration  of  the  components 
of 

of  (2.1.2-9a)  vill  diverge.  As  a  result,  convergence  problems  asso¬ 
ciated  with  excessively  large  intermediate  values  of  x(t)  or  A(t)  may 
be  alleviated  by  adopting  parallel  shooting. 

In  summary  then,  the  parallel  algorithm  to  be  implemented 
would  be: 


Step  0:  Record  the  noisy  observations  z(t),t  e[t  ,t  ],  arbitrarily 

choose  the  components  of  Y^,  and  set  x(tQ)=mx  where  m^  is 

o  o 

the  a  priori  mean  value  vector  of  the  augmented  initial  state 

vector  x(t  ). 

o 

Step  1;  Find  A(tQ)  by  solving  the  linear  system 
P  *(t  )  =  (m  -x(t  )) 

xo  O  X  0 

O 

where  p  is  the  a  priori  covariance  matrix  of  x(t  ). 
rxo  r  o 

Step  2:  Using  x(tQ),  A(t  )  as  the  first  2n  components  of  Y^  and  the 
remaining  components  of  Y^,  integrate  the  components  of 
(2. 1.2-9?)  over  each  subinterval  by  employing  a  parallel  in¬ 
tegration  method.  Then  record  z(z),  and  x(t)  V  t  z  [t  ,  t^,]. 
Step  3:  Evaluate  the  error  function 
E  *  | |?Ye  +  QYr  -  v| I2 

Step  U:  If  the  error  function  is  sufficiently  small,  the  currently  re¬ 


corded  values  of  x(t)  are  accepted  as  the  "smoothed"  estimates 


we  desire.  Hence,  we  may  terminate  the  algorithm.  Otherwise, 
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using  the  recorded  noisy  measurements ,  compute  a  new  value  of 
so  as  to  minimize  E  by  employing  a  parallel  minimization 
method.  Now  return  to  Step  1. 

Note  that  the  parallel  algorithm  above  reduces  to  a  parallel 
version  of  ordinary  shooting  if  only  one  su'cinterval  is  used. 

Adaptive  Mesh  Selection 

To  implement  the  parallel  algorithm  discussed  in  the  previous 

section,  the  mesh  points  t  <t, <, . .  <t  =t_  must  be  chosen.  Given  that 

ol  «  i 

N  has  been  specified,  the  problem  is  to  "optimally11  select  the  mesh 
points.  The  technique  we  shall  propose  for  optimizing  the  mesh  is 
based  upon  using  the  local  truncation  error  associated  with  any  numeri¬ 
cal  method  for  integrating  differential  equations.*  Upon  convergence 
cf  the  procedure,  the  mesh  points  'will  be  optimal  in  the  sense  of 
minimizing  the  maximum  local  truncation  error  over  each  subinterval. 
Formally,  the  method  is  as  follows: 

Step  C:  Let  N  be  specified.  Then  partition  the  interval  [t  ,t*]  into 

0  j. 

N  subintervals  of  length  1,.  Vi»l,  2,  . .  ,,N. 

X 


*t  't  w  "t  =t- 

o  1  2  N-l  N  f 

Set  Z- 0  and  E^  equal  to  some  large  positive  real  number. 

Step  1:  Integrate  the  components  of  (2.1.2-9a)  over  each  subinterval 

and  find 


*  The  local  truncation  error  is  defined  to  be  the  norm  of  the  differ¬ 
ence  between  the  computed  solution  and  the  exact  solution  of  an 
initial-value  problem.  Techniques  for  estimating  this  quantity  based 
or.  Taylor  series  approximations  may  be  found  in  reference  [22]. 
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subject  to 

j.l  4J =  V*.-  » o. 

Set  £•*■£■+  1  and  go  to  Step  1. 

Note  that  in  Step  I  above  the  local  truncation  errors  can 
be  computed  simultaneously  by  separate  arithmetic  processors.  By 
combining  the  adaptive  mesh  selection  algorithm  and  the  parallel 
shooting  algorithm  described  in  the  previous  section,  ve  vill  have  a 
rapid  method  for  accurately  obtaining  a  MAP  estimate  of  the  states 
and  parameters. 

This  is  easily  achieved  by  augmenting 
with  the  unknown  subinterval  lengths  ...  ,  A^  and  minimizing 

an  error  function  of  the  form 

2  -  ||P£  +  QYr  -  y'!2  ^  j|e|!2  (2.1.2-11) 

subject  to  the  constraints  given  by 
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Suppose  for  the  system  defined  by  eon.  (2.1-1),  w(t)  =  0 

V  te[t  ,t„].  That  is,  orocess  noise  is  omitted  from  the  state  model, 
o  i 

Then  the  fixed  interval  smoothed  estimate  of  the  unknown  state  and 
parameters  may  be  found  by  searching  for  the  vector  2(t  )  which  mini- 
mines  the  functional 

J  3  Hi  !$(to)-«  1 12  -1 

o  Hxo 


hJ  f  M  z(t)-h[x(t)  ,t  ]  I  |2H-l(t jit 


subject  to 


x(t)=f[x(t) ,t]  V  t  £  [to,tf3. 


(2. 1.3-1) 

(2.1. 3-2) 


The  most  direct  method  for  solving  this  problem  would  be  to 

initially  set  x(t  )  to  mx  ,  integrate  eon.  (2.1. 3-2)  forward  in  time 

o 

over  the  interval  [t  ,t^]  and  evaluate  the  performance  index  (2. 1.3-1). 

o  ± 

3y  considering  changes  in  the  performance  index  due  to  changes  in 

x(t  )  ,  one  may  use  this  information  to  decide  if  this  procedure  should 

be  repeated.  Specifically ,  if  the  change  in  J  is  sufficiently  small, 

then  x(t  )  is  accented.  Otherwise,  the  value  of  x(t  )  should  be 
o'  o 

selected  such  that  the  performance  index  is  minimized. 

To  speed  computations,  parallel  integration  methods  may  be 
used  to  integrate  eqn.  (2. 1.3-2),  while  the  selection  of  the  next 
value  of  x(tQ)  may  be  made  using  a  parallel  minimisation  method. 

An  example  illustrating  this  procedure  is  given  in  Section 


5 .2 . 1  of  this  thesis .  For  now,  however,  let  us  devote  our  attention  to 
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The  terminal  time,  tf,  nay  be  either  prescribed  or  be  an  unspecified 
problem  parameter.  It  is  assumed  that  the  initial  state  vector  x(t  ) 
and  M  components,  0  £  M  <_  n,  of  the  final  state  vector,  x(t  )  are  knovn, 
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■J . 
1 


< 


<  M. 


i  .  e .  ,  x(  t  '  =  and  for  M  >C,  x;  _  _ 


The  optimal  control  problem  is  that  of  finding  a  control 


vec 


some  admissible  set,  u;t. 


f  +.  ' 


e  V,  which  minimizes  a  performance 


J 


u(o),o! 


-<£  I 


subject  to  the  differential  constraints  (2. 2,1-1)  and  the  above 
boundary  conditions.  Observe  that,  if  M  components  of  the  final  state 
vector  are  known,  then  these  quantities  need  not  be  incorporated  in 
the  penalty  function,  <t>(x( t*)),  shown  in  ecn.  (2. 2. 1-2).  The  solution, 
u(t)  =  u*(t),  of  the  optimal  control  problem  is  called  the  optimal, 
control  and  is  assumed  to  exist  and  to  be  unique. 

In  practice,  it  is  rather  difficult  to  find  the  optimal 
control  since  to  do  so  it  is  necessary  to  solve  a  highly  nonlinear 
twc-ooint  boundary  value  problem  (NTPBV?).  Since  the  solution  of 
:tT?3V?  1  s  is  often  very  time  consuming,  the  role  of  parallelism  might 
be  to  reduce  the  computational  burden  associated  with  solving  NT?2V?!s. 
This  idea  is  pursued  further  in  ohe  next  section. 

2.2.2  Cotim&l  Control  Algorithms 


For  a  control  signal  to  be  optimal,  Pontryagi 
Principle  [  23]  indicates  that  the  control  muse  minimize 
defined  as 

H(x(t),  A(t),  u(t))  =  L(x(t),  u(t),t)  +  AA(t)  f(x{t), 
where  A(t)  s  IT  is  the  costate  or  adjoint  vector.  Let 
element  of  U  and  let  x*(t)  be  one  solution  of  sen.  (2.2 


n ' s  Minimum 
the  Hamiltonian 
[2. 2. 2-1) 


u#(t)  be  an 
.1-1)  which  is 
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i 


w 


defendant  cn  x*(t  )  —  x*  and  u(t)  =  u*(t).  In  order  for  u*(t)  no  be 
o  o 

the  optimal  control  the  following  necessary  conditions  for  optimality 
must  be  satisfied. 

3H, 


x*(t)  =  |§(x*(t),  X*(t) ,  u*(t) ) 


\*{z) 


3H 


3x 


*i<M '  ’i 


X. 

1 


o’“fJ 


*(t)) 

(2. 2. 2-2) 

uMt)) 

(2. 2.2-3) 

M 

ro 

«* 

4 

4 

V 

M 

(2.2.2-M 

M  +  1 ,  M  + 

2,  .  .  .H 

(2. 2. 2-5) 

l(X*(t),  A* 

(t)  ,  u(t) ) 

(2. 2. 2-6) 

:  U 

(2.2.2-T) 

These  necessary  conditions  may  be  used  to  solve  many  problems  of 
interest  in  optimal  control  theory.  In  particular,  our  efforts  will 
focus  on  developing  parallel  algorithms  to  solve  specified  terminal 
time  problems,  bounded  control  problems  and  free  terminal  time  problems. 
2.2.2. 1  Specified  Terminal  Time  Algorithm 
Let  us  assume  that  the  condition 

||  (x(t),  X(t),  u ( t ) )  =  0  (2. 2. 2. 1-1) 

car.  be  explicitly  solved  for  u(t),  the  control  is  not  subject  to  a 
magnitude  constraint  and  the  terminal  time  is  specified. 

Under  these  conditions  eqn.  (2. 2. 2'-S)  requires  that 

||  (x*(t),  A*(t),  u*(t))  =  0  *u*(t)  =  h(x*(t ) ,  X*(t ) ' 

(2. 2. 2. 1-2; 

Nov  consider  the  system  of  equations 
u(t)  =  h(x(t ) ,  X{t) ) 


(2. 2. 2.1-3) 
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V  '  -  '  -  I  v  f  *  >  Wj.  1  , .  /  4.  l  \  _  rl  J.  ■ 

•  .  -  /  -\ ,  v  X  v  ^  ■  l  i  i  —  .  .  /  J  *  \ a  -  /  i 


't),  t)  (2. 2.2.1— 


*-)  =  '  fx  X^>  u(‘^'  =  gvx(t),  Xft),  u(t),  tj 


(2. 2. 2. 1-5 


rdary  conditions 


X  ( t  )  =  X  X .  ( t  , )  =  O 

o  o  i  f  i 


i  =  1,  2,  .  .  ,  M 


X(t  )  =  X 


X..(tf)  =  (x(t,))  i  =  M+l,  M+2 . 


Let  ft  represent  that  set  of  vectors  for  which  the  system 
(2. 2. 2-2)  and  (2. 2. 2-3),  with  u(t)  given  by  (2. 2. 2-6),  has  a  unique 
solution  V  t  c[t  ,  t  ].  Then  for  each  X„  eft,  there  corresponds  a 

OX  -J 

unique  non-negative  value  for  the  scalar  function? 

2-f  (x  (t  )  -  a  )2  +  ?  (X  (t  )  -  4  (x(tj)2 

•  -t  *  X  X  •  _»  /  i  n  X  X  A  .  x 

1-1  i=M+l  l 


(2. 2. 2. 1-6) 


The  function  2  will  be  referred  to  as  an  error  function. 


Notice  that  if  one  could  find  X*  e  ft  such  that  the  forward 

o 

integration  of  eqr.s .  (2.2.2.1-U)  and  (2. 2. 2. 1-5)  leads  to  Z  =  0,  then 
the  resultant  solution  of  eqns  .  (2. 2. 2.1-4;  and  ( 2 . 2.2 . .'.-5 )  subject  tc 
eon.  ,2. 2. 2. 1-3)  would  satisfy  the  necessary  conditions  (2. 2. 2-2), 

(2. 2. 2-3),  (2.2.2-U),  (2. 2. 2-5)  and  (2. 2. 2. 1-2).  The  associated 
control  vector,  u(t),  as  specified  by  eqn.  (2. 2. 2. 1-3)  would,  there¬ 
fore,  be  taken  as  the  optimal  control  for  the  original  optimel  control 
problem.  Since  x(t  )  is  assumed  to  be  known,  the  problem  of  finding 
X*  and  hence  of  solving  the  optimal  control  problem  is  equivalent  to 
the  problem  of  minimizing  the  error  function  given  by  eqn.  (2. 2. 2. 1-6). 


29 


Since  the  initial  state  and  terminal  time  are  specified,  this 
may  be  accomplished  by  iteratively  updating  the  initial  costate  until 
the  error  function  (2. 2. 2. 1-6)  is  minimized.  It  may  be,  however,  for 
any  given  initial  costate  the  solution  of  eqns.  (2. 2. 2. 1-1+)  and 
(2. 2. 2. 1-5)  together  with  eqn.  (2. 2. 2. 1-3)  may  become  excessive  in 
magnitude  for  t  <  t.,.  In  order  to  cope  with,  such  situations,  a  tech¬ 
nique  used  by  Isaacs  [ 2u ]  or  the  method  of  parallel  shooting  can  be 
adopted. 

Isaacs  Procedure: 

If  X  e  ft,  then  there  will,  in  general,  exist  some  t'<tf 
to  the  left  of  which  the  solution  of  eqr.s.  (2. 2. 2. 1-1+)  and 
(2. 2. 2. 1-5)  together  with  eqn.  (2. 2. 2. 1-3)  remains  computable. 
Consider  then  the  optimal  control  problem  which  is  identical  to 
the  original  problem  except  that  t  i3  replaced  by  t’.  Using  Xq  , 
as  the  "priming  guess"  at  the  optimal  initial  costate  for  this 
modified  optimal  control  problem,  a  solution  can  be  obtained  and 
the  resulting  initial  costate,  X*,  is  taken  as  a  candidate  for 

membership  in  ft.  If  X*eft,  t .  may  be  replaced  by  its  original  value 

-  ox 

and  the  solution  to  the  original  optimal  control  problem  can  be 
pursued. 

If,  however,  X*£ft,  then  there  will,  in  general,  exist  t", 
t'<tr,<t^,  such  that  the  solution  of  eqns.  (2. 2. 2. 1-1+)  and  (2. 2. 2. 1-5) 
together  with  eqn.  (2. 2. 2. 1-3)  remains  computable  to  the  left 
of  t".  A  new  optimal  control  problem  in  terms  of  t"  is  then  posed 
and  the. process  is  repeated. 
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Experience  with  Isaacs'  method  indicates  that  this  method 
is  particularly  well  suited  only  for  problems  with  relatively  short 
mission  times.  This  is  a  result  of  the  fact  that,  if  the  mission 
time  is  too  long,  the  sensitivity  of  the  solution  to  small  changes  in 
the  initial  costate  becomes  too  excessive  for  the  Isaacs  method  to 
overcome.  For  short,  mission  times,  however,  convergence  to  an 
element,  £  ft,  is  quite  rapid.  Convergence  may  be  accelerated 
still  further  if  parallelism  is  introduced  when  integrating  the  state 
and  costate  equations  forward  in  time.  Also,  the  selection  of  the 
next  value  of  the  initial  costate  can  be  made  using  a  parallel  minimi¬ 
zation  procedure. 

Parallel  Shooting  Solution  cf  Optimal  Control  Problems 

In  some  cases,  the  problem  under  consideration  may  be  sen¬ 
sitive  to  small  perturbations  in  the  initial  costate  and,  as  a  result, 
convergence  to  an  optimal  solution  may  be  slow  (if  convergence  occurs 
at  all).  In  this  situation,  parallel  shooting  has  proven  to  be  very 
effective  in  alleviating  such  problems.  By  invoking  the  principle  of 
duality,  the  parallel  shooting  procedure  described  in  Section  2.1.2 
to  solve  optimal  estimation  problems  may  be  employed  to  solve  optimal 
control  problems  also. 

To  illustrate  the  parallel  shooting  procedure  for  optimal 
control  problems,  eqns .  (2.2.2.1-10  and  (2. 2. 2. 1-5)  are  concatenated: 

lj<lL  ( - t  sUM,  xM,  «(t>,  t) 

\Mt)/  u(t),  t)l 

(2.2.2.1-7a) 
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x(t  ) 
o 


X(tQ) 


+  B 


x(tf) 

X(tJ 


Mt"f) 


A  a 


(2. 2. 2.1- 7b) 


where  a  is  a  2n  vector  of  known  boundary  conditions  and  the  elements 
of  the  2nx2n  matrices,  A  and  3,  are  chosen  such  that  ean.  (2.2.2.1-7b) 


is  satisfied.  By  defining  y(t) 


and  partitioning  the  interval 


[t  ,t^.]  into  N  subintervals  tQ  <  t^  <  . . . .  <  t^=  t^.,  the  NTPBVP 

represented  by  eqn.  (2. 2. 2. 1-7)  can  be  written  as 

yj  *  =>j(y^ ,  t)  tettj,  tj+1]  (2.2.2.1-8a) 

Ay  (t  )  +  ByN_1(tf)  =  a  J  ■  0,  1,  ...,M  (2.2.2.1-8b) 


where 


y(t) 


yj(t)  •< 


*j+i 


otherwise 

Since  y(t)  is  required  to  be  continuous  at  the  partition  points,  it  is 
necessary  that 


yj_1(tJ)  =  yj(tj)  j  =  1,2 . N-l  (2. 2. 2.1-9) 

Combining  eqns.  (2. 2. 2.1-8)  and  (2. 2. 2. 1-9)  results  in  the  following 
NTPBVP: 


Y(t)  *  F(Y(t),  t)  t  e  [t  ,  tj  (2.2.2.1-10a) 

o  f 

PY t  +  QYr  =  y  (2.2.2.1-10b) 

Here  Y(t),  P(Y(t),  t),  Y^,  Yy  and  Y  are  2nN  vectors  defined  as: 
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<311  <111 


yjt) 


yn  (t) 


yN-l(t) 


7.(0 

0  o 


yl(v 


F(Y(t),t)  A 


Y  A 

r  = 


yN-l^Xf-i^ 

with  P  and  Q.  being  2nNX2nN  matrices  of  the  form 


yN-l^f^ 


P  = 


Y  A 


x(t  ) 
o 


X£tf ) 


A  0  ....  6” 

~  C  0  .  .  .  .  s  ’ 

0  I  0  ...  0 

-I  0  ....  o 

•  •  » 

0  -I  0  ...  0 

•  4  * 

•  •  • 

0 

<2  * 

•  o 

•  »  • 

c 

• 

• 

• 

o 

o  .  ..o-io 

L_  _j 

In  view  of  this  formulation,  the  parallel  variation  of 
extremals  algorithm  considers  the  selection  of  Y^  to  minimize 


s  a  |  [  nt  +  2Yr  -  Y 


(2.2.2.1-11) 


subject  to  the  dynamic  constraint  (2.2.2.1-10a) .  This  defines  a  new 
optimization  problem  involving  constant  rather  than  time  varying 


unknowns.  Observe  that  finding  the  vector  Y^  such  that  E  =  0  is 
equivalent  to  satisfying  the  necessary  conditions  for  optimality. 

Also,  it  is  important  to  note  that  the  components  of  eqn  (2.2.2.1-10a) 
are  uncoupled  and  hence  may  be  integrated  by  simultaneously  shooting 
over  each  subinterval. 

In  summary  then,  the  procedure  to  be  followed  is: 

Arbitrarily  choose  the  components  of  Y^. 

Integrate  the  components  of  eqn.  (2.2.2.1-10a)  simultaneously 
starting  at  Y^  using  a  parallel  integration  scheme. 

Evaluate  the  error  function  E  ^  |  |  PY ^  +  QY  —  y  |  | 

If  the  error  function  is  sufficiently  small,  then  Y^  is  ac¬ 
cepted.  Otherwise,  update  Y^  such  that  E  is  minimized 
by  using  a  parallel  minimization  procedure. 

Note  that  the  parallel  algorithm  above  reduces  to  a  parallel 
version  of  ordinary  shooting  if  only  one  subinterval  is  used.  In  this 
case,  however,  the  algorithm  may  still  be  considered  a  parallel  method 
since  the  differential  equations  may  be  integrated  using  a  parallel 
in-egration  scheme.  Also,  the  partition  points  required  by  this  algo¬ 
rithm  may  be  optimally  selected  via  the  adaptive  mesh  selection  algo¬ 
rithm,  discussed  in  Section  2.1.2. 

2 . 2 . 2 . 2  Bounded  Control  Algorithm 

The  techniques  described  in  the  previous  section  can  be 
extended  to  problems  with  control  constraints  of  the  form: 

u.(t)  <3.  V  t  e  [t,  tj  i  ■  1,  2,  ....  r  (2. 2. 2. 2-1) 

2.  1  O  -l. 

The  method  for  handling  constraints  of  this  type  is  based  on  the  fact 


Step  1: 
Step  2: 

Step  3: 
Step  U; 
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that  for  each  t  z  [t  ,  t_],  u*(t)  is  either  on  its  boundary  or 

o  f 

(x*(t) ,  X*(t),  u*(t))  =  0.  Consequently,  in  the  evaluation  of  the 
<3  u 

error  functions  described  in  Section  2. 2. 2.1,  u(t)  is  replaced  by 


h^(x(t),  X(t))  forh^xtt),  X(t))  £Bi  i  =  1,  2,  . . .  ,r 

u  (t)  =<j 

1  3, sgn  (hiix(t),  X(t) )  otherwise  (2. 2. 2. 2-2) 


The  approach  can  also  be  extende  1  to  some  cases  where 

_ / ^  ^  .  x  \  .. /  x  U  rt  _  Vx  xm.1  ■  x  ^  4>  1  *.  .xl  .ra<)  W*  +  # 


^px(t),  X(t),  u(t))  =  0  can  be  explicitly  solved  for  the  control. 

For  example,  consider 

H(x(t),  X(t),  u(t))  ■  F^(x(t ) ,  X(t) )+Fg(x(t ) ,  X(t))u(t) 

(2.2. 2.2-3) 

where  u(t)  <_  ,  i  =  1,  2,  . . .  ,  r  V  te[tQ,  tf] 

In  this  case,  the  extremization  required  by  eqn.  (2. 2. 2-6)  is  carried 
out  directly.  Thus,  eqn.  (2. 2. 2.1-3)  is  replaced  by 


u(t)  *  -  B  3gn  (T2(x(t),  X(t))) 


(2.2.2.2-U) 


Note  that  this  technique  requires  that 


Fg (x(t ) ,  X(t ) )  i  0  V  te[tQ,  tf] 
since  eqn.  (2.2.2.2-U)  would  be  undefir  d  in  this  case.  Note  that  if 
this  occurs  on  the  optimal  trajectory,  the  problem  is  called  a 
singular  control  problem. 

2. 2. 2. 3  Free  Terminal  Time  Algorithm 

To  accommodate  problems  when  t^  is  free,  we  utilize  the 
necessary  condition 

H(x*(t),  X*(t),  u»(t))  *  0  V  t  [tQ,t*3  (2.2. 2. 3-1) 


*  Note  that  the  bang-bang  control  problems  fall  into  this  category. 


1 

In  particular,  sines  the  Hamiltonian  must  be  zero  at  t  «  t  ,  the 
following  constraint  must  be  satisfied. 

H(x(tQ),.X(to),  u(tQ))  »  0  ’  (2. 2. 2. 3-2) 

Incorporating  the  constraint  given  by  eqn.  (2. 2. 2. 1-3)  into  eqn. 

(2. 2.2. 3-2),  results  in 

H(x(to),  X(tQ),  h(x(to),  X(tQ) )  =  0  (2. 2. 2. 3-3) 

For  computational  purposes,  we  restrict  our  consideration  to  those 
cases  where  eqn.  (2. 2. 2. 3-3)  can  be  explicitly  solved  for  one  of  the 
components  of  the  initial  costate  vector .  For  convenience,  assume 
this  to  be  the  first  component  of  X(tQ).+  From  eqn.  (2. 2. 2. 3-3),  it  can 
be  seen  that  X1(tQ)  is  a  unique  funetion  of  x(tQ)  and  the  remaining  com¬ 
ponent  of  the  initial  costate  vector. 

Let  this  value  of  X^(tQ)  be  defined  by: 

W  *  Wv»  x(tc)}  (2. 2.2. 3-1*) 

where  the  (n-l)-vector  A(t  )  is  defined  as 

o 


x(tQ)  -  (x2(tQ),  x3(to),... ,xn(to))T 

In  view  of  eqn.  (2. 2. 2. 3-1*)  and  the  fact  that  t^  is  unspecified,  a 
suitable  error  function  which  must  be  minimized  by  selecting  t^  and 


X  (t  )  would  be 
o 


I  (x.(t  J-c .  )2  +  l  (X  (t  )  -  *  (x(tj))‘ 

i-1  1  r  1  i-Mfl  1  r  Xi  f 


(2. 2. 2. 3-5) 

Clearly,  this  error  function  can  be  viewed  as  a  function  of  the 


*  If  this  is  not  the  case,  we  simply  reorder  the  components  of  the 
initial  costate  vector. 


j 


36 


following  n-vector: 


("f,  ^(tQ) )  —  (tf>A2(-co) ,  ^(t0)  ■ .  »*n(t0)}  • 

Observe  that  if  one  could  find  (tf,  A(t  ))  e  such  that  E  I  0 

subject  to  the  constraints  of  eqn.  (2. 2. 2. 1-3),  (2.2.2.1-U),  (2. 2. 2. 1-5), 

(2. 2. 2. 1-2)  and  (2. 2. 2. 3-1),  then  the  corresponding  control 

vector  as  given  by  eqn.  (2. 2. 2. 1-3)  would  be  the  optimal  control  for 
the  original  optimal  control  problem.  Notice  that  this  is  equivalent 
to  minimizing  the  error  function  (2. 2. 2. 3-5)  subject  to  the  constraints 
of  eqns.  (2. 2. 2. 1-3),  (2.2.2.1-10,  (2. 2. 2. 1-5)  and  (2.2.2.3-10- 

In  summary,  free  terminal  time  problems  may  be  solved  using 
the  following  parallel  algorithm: 

Step  1:  Arbitrarily  select  the  comnonents  of  the  n-vector  (t_,  X (t  )). 

r  o 

Step  2:  Using  x(t  )  and  A(t  ),  evaluate  A.  (t  )  using  eqn.  (2.2.2.3-M . 

O  O  X  o 

Step  3:  Compute  u(tQ)  from  eqn.  {2. 2. 2. 1-3)  and  use  a  parallel  inte¬ 
gration  method  to  integrate  the  components  of  eqns.  (2. 2. 2.1-1*) 

mm 

and  (2. 2. 2. 1-5)  starting  with  x(tQ),  ^(t0)  and  0VBr 

the  interval  [tQ,t  ]. 

Step  1*.  At  time  tf,  evaluate  the  error  function  given  by  eqn. 

(2. 2.2. 3-5). 


Step  5:  If  the  error  function  is  sufficiently  small,  stop,  otherwise 

mm 

use  a  parallel  minimization  algorithm  to  update  (t_,  A(t  )) 

I  o 

such  that  eqn.  (2.2.2. 3-5)  is  minimized. 

2.2.3  Subopt iaal  Control  Algorithm 

In  the  previous  section,  various  parallel  methods  were 
discussed  which  could  be  used  to  design  optimal  control  systems. 
Although  a  controller  designed  using  these  methods  is  optimal  in  the 
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sense  of  satisfying  the  necessary  conditions  for  optimality,  the 
resultant  control  system  is  open  loop  and,  a9  such,  may  be  very  sen¬ 
sitive  to  environmental  disturbances.  Thus,  it  seems  appropriate  to 
consider  methods  for  designing  a  closed  loop  control  system  to  over¬ 
come  such  problems.  With  this  in  mind,  the  following  parallel  method 
is  proposed. 

Suppose  the  controller  is  constrained  to  be  of  the  form 
u(t)  =  h[x(t),t]  V  t  e[to,tf]  (2. 2. 3-1) 

where  h[x(t),t]  is  assumed  to  be  continuous  and  specified  by  the  con¬ 
trol  system  designer  up  to  a  set  of  constants.  For  example,  to  design 
a  linear  feedback  controller  one  might  select 

u(t)  =  h[x(t),t]  =  kx(t )  (2.2. 3-2) 

where  k  is  a  gain  matrix  whose  elements  must  be  determined  3uch  that 
the  closed  loop  system  is  stable.  Once  the  structure  of  the  controller 
has  been  specified,  the  problem  is  simply  to  find  a  finite  number  of 
constants  which  minimize: 

J  =  <j>(x(tf)  ,tf)  +  J  L(x(t),h[x(t)  ,t],t)dt  (2. 2. 3-3) 

o 

subject  to  the  dynamic  constraint  given  by 

x(t)  =  f[x(t),h(x(t),t],t].  (2.2. 3-1*) 

If  we  let  K  *  (k^,  kg,  ....,  kffl)  be  the  vector  of  unknown 
constants  to  be  optimized,  then  the  optimal  elements  of  K  may  be 
found  as  follows: 

Step  0:  Let  u(t)  =  h(x(t),t]  be  specified  and  select  such  that 

the  forward  integration  of  eqn.  (2.2.3-M  is  stable 
over  the  interval  [t  ,t  ] .  Set  t  =  0. 
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Step  1:  Giver.  X  ;  and  the  knovn  initial  state  x(to),  integrate 
eqn.  (2.2. 3-k)  forward  in  tine  over  the  interval  [to,t^] 
using  a  parallel  integration  scheme. 

Step  2:  Evaluate  the  performance  index: 


J  =  +  'tL(x(t)1u(t)  ,t)dt 

wo 

i  (2+i )  It)  i 

Step  3:  If  |J  “  -  J-  [ <£ ,  then  the  current  value  of  K  is  accepted 

and  the  procedure  is  terminated.  Otherwise,  use  a  parallel 
minimization  procedure  to  update  K  such  that  J  is  minimized. 
Then  set  l  -  l+l  and  return  to  Step  1. 

Clearly,  the  simplicity  of  this  method  and  the  fact  that  the 
controller  utilizes  feedback  makes  this  method  very  attractive.  Also, 
by  incorporating  parallel  algorithms  in  Steps  1  and  3  above,  the  com¬ 
putation  time  required  for  convergence  can  be  significantly  reduced. 
Finally,  it  should  be  noted  that  the  direct  gain  optimization  proce¬ 
dure  above  is  the  dual  of  the  direct  state  and  parameter  estimation 
algorithm  discussed  in  Section  2.1.3. 

2 . 3  Adaptive  Control  and  Estimation  Algorithms 

For  many  physical  processes,  variations  in  the  environment 
necessitate  major  modifications  in  the  control  strategy  to  meet  oper¬ 
ating  requirements .  In  such  cases,  an  adaptive  control  system  might 
be  employed  to  provide  near  optimal  control  in  spite  of  environmental 
disturbances.  In  this  section,  an  explicit  adaptive  control  scheme 
is  described  which  employs  parallel  algorithms  to  generate  a  control 
signal  in  response  to  parameter  changes  tracked  by  an  adaptive  para¬ 
meter  identifier.  Since,  in  many  cases,  the  state  variables  required 
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by  the  explicit  adaptive  control  scheme  are  net  accessible,  the 
unknown  states  of  the  system  must  be  estimated  simultaneously  with 
the  parameters. 

Due  to  the  fact  that  the  parameters  may  be  rapidly  varying, 
the  role  of  parallelism  would  be  to  reduce  the  computation  time  needed 
to  update  the  parameter  values,  state  estimates  and  subsequently  the 
control  law.  In  particular,  the  state  and  parameters  may  be  updated 
by  using  the  parallel  state  and  parameter  estimation  algorithms  dis¬ 
cussed  in  Section  2.1  on  a  "window"  of  measurement  data  and  then  the 
parallel  control  algorithms  of  Section  2.2  could  be  employed  to  update 
the  control  based  upon  the  latest  estimates  of  the  states  and  para¬ 
meters  . 

To  this  effect,  this  section  will  be  concerned  with  devel¬ 
oping  parallel  algorithms  for  rapidly  performing  nonlinear  estimation 
and  control  in  an  adaptive  fashion.  Note  that  the  major  goal  is  to 
utilize  these  parallel  algorithms  in  an  explicit  adaptive  controller 
of  the  type  shown  in  Figure  2.1.  Hopefully,  the  use  of  parallelism 
will  permit  the  on-line  implementation  of  such  a  system.  To  this  end, 
let  us  proceed  by  formally  stating  the  adaptive  control  problem. 

2.3.1  Problem  Statement 

Consider  a  stochastic  nonlinear  dynamical  system  and  mea¬ 
surement  model  represented  by 

x(t)  =  f[x(t) ,u(t) ,t]  +  G[x(t) ,t]v(t)  (2. 3. 1-1) 

z(t)  ■  h[x(t) ,t]  +  v(t)  (2. 3.1-2) 

where  x(t).  is  an  augmented  state  vector  which  contains  any  unknown 
parameters,  u(t)  is  a  control,  and  z(t)  is  a  measurement  vector. 


FIGURE  2.1:  Explicit  Adaptive  Control  of  a  Nonlinear  Itynamical  S: 


The  r.oise  processes  w(t)  and  vft)  sure  nut tally  independent  zero-near: 


white  Gaussian  noise  processes  with  corresponding  covariance  matrices 


E(w(  t )w* ( s ) }  =  Q(t)o't-s) 


•  -  L  ~ '  5  1  t 


E{v(t ) v* (t ) }  =  H ( t }  3(t-s) 


<  ♦’  c  <  * 

"o  —  "  ’  —  i 


Also,  it  is  assumed  that  the  initial  state,  is  Gaussian  and 

uncorrelated  ’with  w(t)  and  v{t).  Furthermore,  consider  the  perfonn- 
ance  criterion 

J  =  E{<J>(x(t_„)  ,t^]  +J  t*LCx(t)  ,u(t)  ,t]dt:  (2.  3:1-3) 

where  E  {•}  is  an  expectation  operator.  The  objective  is  to  determine 
the  control,  u(t),  which  minimizes  eqn.  (2. 3. 1-3)  subject  tc  the  sto¬ 
chastic  dynamic  constraints  given  by  (2. 3.1-1)  and  (2. 3-1-2). 

The  approach  we  shall  take  in  solving  this  stochastic  control 
problem  is  similar  to  that  of  Larsen  and  Tse  [id.  who  proposed  separat¬ 
ing  this  problem  into  a  deterministic  control  problem  and  a  nonlinear 
estimation  problem.  Basically,  the  approach  is  as  follows: 

Suppose  for  a  given  system,  the  state  and  a  nominal  set  of 
parameters  which  define  the  systems  equations  of  motion  are  known  at 
the  initial  time.  Because  the  structure  of  the  estimator  and  controller 
shown  in  Figure  2.1  is  assumed  to  be  known,  we  may  set  the  appropriate 
parameters  in  the  adaptive  estimator  and  controller  to  their  nominal 
values  before  the  process  to  be  controlled  is  started.  When  this 
initialization  is  complete,  the  process  is  started  and  the  control 
is  computed  on-line  and  applied  to  the  plant  as  the  process  evolves 


for  all  t  > 


To  account  for  uncertainties  in  the  plant  parameters  and 
disturbances,  it  nay  be  necessary  to  adapt  the  control  during  the 
mission  tine  interval  This  may  be  accomplished  on-line  by 

updating  the  control  by  employing  the  parallel  control  algorithms 
discussed  in  Section  2.2  at  the  adaptation  times  t  where 
t.£[tQ,tJ  V  i  =  1,  2,  ... 

However,  to  use  the  algorithms  of  Section  2.2.  ,  an  estimate  of  the 
process  parameters  and  unknown  state  variables  must  be  available  at 
the  time  of  adaptation.  These  estimates  can  be  acquired  by  recording 
the  noisy  observations  z(t)  over  the  interval  [t.  ,ti+J  811(1 

using  the  nonlinear  SAP  estimation  algorithms  discussed  in  Section  2.1. 

The  idea  outlined  above  forms  the  basis  for  the  explicit 
adaptive  control  scheme  which  is  illustrated  by  the  timing  chart  shown 
in  Figure  2.2.  With  this  background,  we  can  proceed  to  the  next 
section,  in  which  the  details  of  the  adaptive  control  scheme  are 
presented. 

2.3.2  Direct  Explicit  Adaptive  Control 

In  this  section,  an  explicit  adaptive  control  scheme  is  pre¬ 
sented  which  utilizes  the  direct  estimation  and  control  algorithms 
discussed  in  Sections  2.1.3  and  2.2.3  respectively.  In  particular, 
consider  the  state  and  measurements  models  given  by  eqns .  (2. 3.1-1) 
and  (2. 3.1-2)  with  w(t)  =  0  V  t  e[t  ,  tp].  That  is,  no  process  noise 

O  X 

is  present.  It  is  assumed  that  the  nominal  initial  state  of  this 
process  is  known  and  a  nominal  set  of  parameters- which  define  the 
dynamics  represented  in  eqn.  (2. 3.1-1)  are  given.  Also,  let  us  assume 
that  the  mission  time  [t  ,t^,3  is  finite  with  the  partition 
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FIGURE  2.2:  Adaptive  Control  'Timing  Chart 


-•<Vl<tNStf 

and  that  the  control  lav  to  he  optimized  is  linear  in  the 
state  as  follows : 

u(t)  =  Kx(t) 

To  implement  the  explicit  adaptive  control  scheme,  a  sequence  of 
control  and  estimation  problems  must  be  solved.  In  particular,  the 
following  must  be  solved  on-line: 

Control  Problem: 


min  J  -  f  fL[x(t) ,Kx(t),t]dt  i  =  0,  1,  2,  ..  (2. 3.2-1) 

K  J  t. 

x 

subject  to  x(t)  =  f[x(t) ,Kx(t) ,t]. 

SAP  Estimation  Problem: 


min 

x(ti) 


i  *  0,1,2 . S 

(2. 3. 2-2) 


subject  to 

kit)  =  f[x(t) ,Kx(t),t]. 

Note  that  x(t)  is  an  augmented  state  vector  which  contains 
the  unknown  parameters  to  allow  the  simultaneous  estimation  of  the 
states  and  parameters. 

In  view  of  the  above  problem  formulation,  the  following  paral¬ 
lel  procedure  might  be  employed  to  adapt  the  control  in  response  to 
parameter  changes  detected  on-line. 

Explicit  Adaptive  Control  Algorithm  -  Direct  Method 

Step  0:  Initialize  the  estimator  and  controller  with  a  nominal  set  of 
parameters  and  control  gains.  Start  the  process  and  apply  the 
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control  which  is  computed  based  upon  the  nominal  quantities 

as  the  trocess  evolves  for  t  >  t  .  Set  i  *  0. 

—  o 

Step  1:  Throughout  the  interval  te[t^,t^+^]  record  the  noisy  measure¬ 
ments  z(t)  and  the  control  u(t).  At  time  t^+^,  use  the 
recorded  values  to  update  the  estimates  of  x(t)  by  minimizing 
eqn.  (2. 3 -2-2)  using  the  direct  SAP  estimation  algorithm 
discussed  in  Section  2.1.3. 

Step  2:  Reinitialize  the  controller  with  the  updated  estimates  of 
2(ti+1)  and  reoptimize  the  control  gains  over  the  interval 
[ti+l*tfj  "ky  minimizing  eqn.  (2. 3-2-1)  using  the  direct  gain 
optimization  procedure  presented  in  Section  2.2.3. 

Step  3:  If  tj+^ctj,  apply  the  reoptimized  control  to  the  process  for 
t  _>  t.+.  ,  set  !-*■  i+1  and  go  to  Step  1.  Otherwise,  stop  since 
the  mission  time  has  been  exhausted. 

The  optimality  of  the  control  histories  generated  according 
to  the  above  procedure  primarily  depends  upon  two  items : 

•  The  reliability  of  the  state  and  parameter  estimates  obtained  at 
the  adaptation  times. 

•  The  ability  of  the  parallel  algorithms  to  reduce  the  performance 
criteria  given  by  eqns .  (2. 3-2-1)  and  (2. 3- 2-2). 

The  stability  of  this  algorithm  depends  mainly  on  how  far 
the  actual  process  parameters  are  from  their  nominal  values  when  the 
process  is  started,  the  degree  of  parameter  variation  during  the 
mission  time  and  the  frequency  of  adaptation. 

Although  the  explicit  adaptive  control  algorithm  previously 
described  employed  the  direct  estimation  and  control  procedures,  it 
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could  Just  as  easily  employ  the  indirect  methods  discussed  in 
Sections  2.1  and  2.2.  In  this  case,  these  methods  would  be  used  to 
solve  the  NTPBV?'s  associated  vith  the  optimal  control  and  estima¬ 
tion  problems.  Note  that  if  parallel  shooting  is  used,  the  adapta¬ 
tion  times  should  correspond  vith  the  mesh  points  required  by  the 
parallel  shooting  method.  It  should  be  emphasized  that  no  matter 
which  parallel  algorithm  (direct  or  indirect)  is  employed  to  perform 
the  estimation  and  control  operations,  the  role  of  parallelism  is  to 
reduce  the  computation  time  enough  to  allow  the  on-line  implementation 
of  the  explicit  adaptive  control  scheme. 


CHAPTER  THREE 


PARALLEL  ALGORITHMS  FOR  REDUCING  COMPUTATION  TIME 

To  reduce  the  amount  of  computation  time  associated  vith 
the  parallel  algorithms  described  in  Chapter  Two,  one  may  employ 
parallel  minimization  algorithms  and  parallel  methods  for  integrating 
ordinary  differential  equations  (ode's).  Specifically,  a  reduction  in 
computation  time  is  possible  because: 

••  Parallel  minimization  algorithms  generally  require  fever  iterations 
to  minimize  a  function  compared  vith  serial  methods. 

•  Parallel  integration  procedures  allow  many  of  the  arithmetic  oper¬ 
ations  associated  with  integrating  ode's  to  be  performed  simulta¬ 
neously  on  separate  processors. 

In  Section  3-1,  a  survey  of  parallel  minimization  procedures 
is  presented.  Also  in  this  section,  a  class  of  parallel  rank -two 
quasi-Newton  methods  are  developed  which  is  one  of  the  major  contribu¬ 
tions  of  this  thesis. 

In  Section  3.2,  parallel  integration  methods  are  surveyed 
and  one  of  the  methods  is  extended  so  that  the  integration  step  size 
is  automatically  adjusted  to  maintain  a  desired  level  of  accuracy 
while  keeping  the  parallel  structure  of  the  algorithm.  The  develop¬ 
ment  of  such  a  parallel  variable  step  size  integration  scheme  is  & 
significant  contribution  in  its  own  right. 

Finally,  the  advantages  of  utilizing  the  new  parallel  methods 
developed  in  this  chapter  are  illustrated  by  comparing  these  methods 


With  regard  to  the  monotone  sequence,  it  has  been  observed 
that  if  this  sequence  is  selected  such  that  it  approaches  zero  too  rapid¬ 
ly,  then  the  total  number  of  function  evaluations  required  to  locate  the 
minimum  becomes  needlessly  large  and  as  a  result,  the  amount  of  time  re¬ 
quired  for  convergence  increases  significantly.  On  the  other  hand,  if 
the  monotone  sequence  approaches  zero  too  slowly,  these  relatively  large 
values  may  cause  the  Chazan-Miranker  algorithm  to  become  unstable. 

Experience  with  the  Chazan-Miranker  method  indicates  that 
the  performance  of  this  method  is  highly  dependent  on  the  choice  of 
algorithm  parameters  which  is  not  very  desirable. 

Parallel  Variable  Metric  Algorithm 

Straeter  has  developed  a  gradient-based  parallel  variable  metric 
(FVM)  algorithm  which  can  be  implemented  on  modern  parallel  computers 
I  26] .  One  of  the  properties  of  the  FVM  algorithm  is  that  if  the  func¬ 
tion  being  minimized  is  a  quadratic  in  n  variables,  then  the  iterates 
will  converge  to  the  location  of  the  minimum  in  one  iteration  provided 
n  levels  of  parallelism  are  used.  Also,  Straeter  has  shown  that  for 
strictly  convex  functions  on  a  finite  dimensional  space,  the  iterates 
converge  to  the  minimum  provided  the  metrics  are  uniformly  bounded. 

Straeter' 3  PVM  algorithm  is  a  parallel  version  of  Broyden's 
symmetric  rank-one  procedure  [29],  which  requires  at  most  n  iterations 
to  find  the  minimum  of  a  quadratic  function  in  n  variables.  Note  that 
when  minimizing  a  quadratic,  the  PVM  algorithm  is  n  times  faster  than 
the  symmetric  rank -one  procedure.  Although  this  is  highly  desirable 
and  the  major  reason  for  developing  a  parallel  minimization  procedure, 
Straeter 's  method  suffers  from  the  same  problems  associated  with 
Broyden's  symmetric  rank-one  procedure. 
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with  some  existing  minimization  and  integration  algorithms. 

3.1  Parallel  Minimization  Algorithms 

In  this  section,  methods  for  unconstrained  minimization 
are  discussed  which  are  suitable  for  modern  parallel  computers.  At  the 
present  time,  only  three  algorithms  have  been  reported  which  possess 
this  feature.  These  methods  include  the  nongradient  algorithm  of 
Chazan  and  Miranker  [25],  the  parallel  variable  metric  (FVM)  algorithm 
reported  by  Straeter  [26],  and  the  parallel  Jacob son-Oksman  (PJO) 
procedure  developed  by  Straeter  and  Markos  [27]. 

These  parallel  procedures  are  described  in  same  detail  in 
Section  3.1.1  to  motivate  the  discussion  of  a  class  of  parallel  auasi- 
Nevton  (PQN)  methods  which  is  developed  in  Section  3-1.2.  In  Section 
3.1.3,  the  PQN  method  is  tested  by  minimizing  a  standard  set  of  test 
functions  and  the  performance  of  this  new  method  is  demonstrated  by 
comparing  it  with  some  popular  minimization  algorithms  currently  in 
use. 

3-1.1  A  Survey  of  Parallel  Algorithms  for  Unconstrained  Minimization 

In  this  section,  three  parallel  algorithms  for  unconstrained 
minimization  are  discussed  to  provide  an  indication  of  the  state-of- 
the-art  in  this  area  of  research.  The  methods  to  be  considered  include 
the  nongradient  algorithm  of  Chazan  and  Miranker  [25]  and  the  gradient- 
dependent  algorithms  developed  by  Straeter  [26],  [27].  The  mathematical 
details  of  each  parallel  algorithm  may  be  found  in  the  Appendix,  while  a 
brief  review  of  their  properties  and  shortcomings  is  given  in  the  re¬ 
mainder  of  this  section. 

Chazan-Miranker  Algorithm 


Chazan  and  Miranker  have  developed  a  parallel  nongradien 


algorithm  for  unconstrained  minimization  which  is  suitable  for  execu¬ 
tion  on  an  array  of  parallel  processors  [25].  It  can  be  shown  that 
this  algorithm  will  converge  for  strictly  convex,  twice  continuously 
differentiable  functions.  Moreover,  if  the  function  to  be  minimized 

P 

is  a  quadratic  in  n  variables ,  the  procedure  will  require  at  most  n 
one-dimensional  minimizations  to  converge.  Since  these  one-dimensional 
minimizations  can  be  performed  simultaneously  using  n  levels  of 
parallelism,  at  most  n  iterations  would  be  needed.  Note  that  this  is 
significantly  faster  than  the  serial  Zangvill-Powell  nongradient 
method  [28],  which  requires  approximately  n  sequential  one-dimensional 
minimizations  to  find  the  minimum  of  a  quadratic  in  n  variables.  This 
implies  that  the  speed-up  due  to  parallelism  increases  linearly  with 
the  number  of  processors  when  minimizing  a  quadratic  by  the  Chazan- 
Miranker  algorithm. 

The  Chazan-Miranker  algorithm  is  based  on  the  properties 
of  conjugate  directions.  In  fact,  it  can  be  shewn  that  the  search 
direction  vectors  generated  by  this  algorithm  form  a  set  of  conju¬ 
gate  directions.  3y  searching  along  these  directions,  convergence 
is  guaranteed  (at  least  when  the  function  being  minimized  is  convex). 
The  rate  of  convergence,  however,  depends  primarily  on  the  accuracy 
of  each  line  search  and  a  monotone  decreasing  sequence  tending  to 
zero . 

With  regard  to  the  line  search,  provisions  must  'be  made 
for  allowing  both  positive  and  negative  values  of  the  linear  search 
parameter  because  the  search  directions  generated  are  not  necessarily 
descent  directions.  Note  that  this  complicates  the  line  search  algo¬ 
rithm  to  some  degree. 
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The  most  noteable  problem  with  all  rank-one  algorithms  (serial 
or  parallel)  is  that  the  update  rule  used  to  construct  the  inverse 
Hessian  is  numerically  unstable.  That  is,  the  update  is  undefined 
when  certain  vectors  are  orthogonal.  Unfortunately,  this  occurs  quite 
often  when  applying  Straeter's  method  to  nonquadratic  functions  and 
generally  results  in  a  nonpositive  definite  update.  Also,  Straeter's 
PVM  algorithm  requires  accurate  gradient  information  for  the  method 
to  converge.  Since  the  gradient  of  highly  complex  functions  is  diffi¬ 
cult  at  best  to  compute  numerically,  this  problem  may  seriously  limit 
the  application  of  Straeter's  method. 

Parallel  Jacobson-Oksman  Procedure 

Another  gradient-dependent  method  for  unconstrained  mini¬ 
mization  which  exploits  the  parallel  computing  capabilities  of  modern 
parallel  computers  is  the  parallel  Jacobson-Oksman  (PJO)  procedure  re¬ 
ported  by  Straeter  and  Markos  [27].  This  algorithm  is  a  modification 
of  the  sequential  Jacobson-Oksman  (SJO)  procedure  [30]  which  assumes 
that  the  function  being  minimized  is  homogeneous.  Because  the  class 
of  homogeneous  functions  contains  the  quadratics  as  a  subclass,  homo¬ 
geneous  functions  are  therefore  richer  than  the  quadratics.  Moreover, 
functions  which  have  a  singular  Hessian  at  the  minimum  can  be  more 
accurately  approximated  by  a  homogeneous  model. 

At  each  iteration  of  the  PJO  algorithm,  a  linear  system  of 
n+2  equations  must  be  solved.  Straeter  has  shown  that  if  the  solution 
of  this  linear  system  exists,  and  the  function  being  minimized  is  homo¬ 
geneous  in  n  variables ,  then  the  PJO  algorithm  will  converge  to  the 
minimum  in  one  iteration  provided  n+2  levels  of  parallelism  are  used. 
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By  comparison,  the  3J0  procedure  requires  n+2  iterations  to  minimize 
a  homogeneous  function  in  n  variables.  Straeter  also  shows  that  the 
PJO  algorithm  will  converge  to  the  location  of  the  minimum  of  any 
function  with  a  continuous,  uniformly  positive  definite  matrix  of 
second  partial  derivatives. 

Although  the  PJO  algorithm  is  relatively  efficient,  if  has 
been  reported  in  [27 j  that  in  practice  the  PJO  algorithm  may  not  per¬ 
form  better  than  the  SJO  algorithm.  Straeter  also  indicates  that  the 
major  problem  associated  with  the  PJO  algorithm  is  its  limited  robust¬ 
ness.  The  term  robustness  used  by  Straeter  refers  to  the  relative 
insensitivity  of  the  PJO  algorithm  to  the  magnitude  of  the  basis  vec¬ 
tors  needed  by  the  PJO  algorithm.  In  fact,  if  the  magnitude  of  the 
basis  vectors  is  too  small,  the  linear  system  which  must  be  solved  at 
each  iteration  may  not  have  full  rank  or  may  be  very  close  to  being 
singular.  The  problems  cited  above  are  not  very  encouraging  and  seem 
to  indicate  that  much  care  must  be  taken  when  using  the  PJO  algorithm. 

In  view  of  the  problems  associated  with  the  parallel  minimi¬ 
zation  algorithms  discussed  in  this  survey,  it  appears  that  there  exists 
a  need  to  develop  a  more  robust  and  dependable  method  for  minimizing  a 
function  of  several  variables.  In  the  next  section,  a  class  of  parallel 
rank-two  quasi-Newton  methods  are  presented  which  are  shown  to  be  more 
robust  and  dependable  than  currently  existing  procedures. 


In  the  previous  section,  a  survey  of  parallel  minimization 
methods  was  presented  and  the  shortcomings  of  these  methods  were  cited. 
Since  the  time  of  their  development,  new  results  have  appeared  in  the 
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literature  which  cay  be  amenable  to  parallel  computation.  In  particu¬ 
lar  ,  Broyden  [29  ]  has  introduced  a  family  of  variable  metric  formulae 
which  are  useful  for  function  minimization  and  have  the  desirable  prop¬ 
erty  of  quadratic  termination  provided  accurate  line  searches  are  used. 

Imbedded  in  Broyden' s  class  of  quasi-Newton  methods  is  the  Davidon- 
Fletcher-Powell  (DFP)  method  [31],  the  Broyden-Fletcher-Shanno  (BFS) 
method  [32]  and  the  symmetric  rank-one  (SRI)  method  [29]. 

Analytical  and  empirical  studies  by  Dixon  [33]  and  [3**],  and  i 

Himmelblau  [35 ]  indicate  that  the  BFS  rule  is  generally  preferable  to 
the  DFP  and  SRI  updates  because  of  its  reliability  of  convergence  for 
a  wide  class  of  problems.  In  view  of  these  results,  the  remainder  of 
this  section  is  concerned  with  restructuring  Broyden' s  class  of  quasi- 
Newton  methods  such  that  the  modified  procedure  posses  a  high  degree 
of  parallelism.  A  particularly  interesting  outcome  of  this  work  is  a 
class  of  parallel  double-rank  quasi-Newton  methods  (such  as  a  Parallel 
Davidon-Fletcher-Povell  (PDFP)  method  and  Parallel  Broyden-Fletcher- 
Shanno  (PBFS)  method,  as  well  as  a  parallel  version  of  the  symmetric 
rank -one  method.  It  is  felt  that  this  new  class  of  parallel  quasi- 
Newton  methods  potentially  can  be  far  superior  to  the  parallel  methods 
surveyed  in  Section  3.1.1. 

3. 1.2.1  The  Parallel  Quasi-Newton  Method 

In  this  section,  a  gradient -dependent  parallel  algorithm 
which  employs  a  rank -two  correction  to  approximate  the  inverse  Hessian 
matrix  associated  with  Newton's  method  is  developed.  One  of  the  de¬ 
sirable  properties  of  this  new  parallel  minimization  algorithm  is  that 
if  the  function  being  minimized  is  a  quadratic  in  n  variables,  then 
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the  inverse  Hessian  can  be  constructed  exactly  in  one  iteration,  pro¬ 
vided  n+1  levels  of  parallelism  are  used.  This  property  of  the  parallel 
quasi-Newton  (PQN)  method  will- be  proven  later  in  this  chapter.  At 
this  time,  however,  it  seems  appropriate  to  formally  present  the  method. 
Parallel  Quasi-Newton  Method 

Given  x^ ,  H*'0'*  ,  and  I  =  (a  ,  ...,  o^)  =  c  1^;  c  >0, 

let  l  =  0,  m  =  2,  and  perform  the  following  steps: 

Step  1: 

(l) 

a.  Let  x  =  x  +  0  .  Then  simultaneously  compute: 

J  J 

g(x^ )  and  gj  =  g(Xj  ) 


V  j  *  1,  2,  ...,  n 

b.  Simultaneously  compute  the  gradient  differences: 

( i) 

yr  si- s(x  }  i  m  2'  •— n 

Step  2: 


Let  i  *  u  and  solve  the  following  linear  system  for  c  , ,  c  „ , 
U.  1  m—  BU£ 

.  .  .  ,  c  .  : 

m,m-l 
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—  — 

T  ,  T  ,  T 

T  . 

yl  ^  y2  dl . Vl  dl 

cal 

-ym  41 

T  ,  T  ,  T 

T  , 

yl  d2  y2  d2 . y»-l  d2 

cm2 

-ya  d2 

. 

3 

•  • 

T  T  m 

T  „ 

^1  4m-l  ^ 2  4m-l  ‘  ym-l  4m-l 

e 

a, m-1 

-y  d  . 
m  m-1 

_  —a 

L_  — 

L- 

Then  construct  the  direction  vector: 

m-1 

d  a  a  +  \  c  o 
m  a  4^  mj  J 

J-l 


If  m  <  n,  set  a  -*•  m+1  and  repeat  this  step.  Otherwise,  go  to  Step  3- 
Step  3: 

a.  Compute  "n+1"  gradients  of  f(x)  at  "n+1"  distinct  points  in 
parallel : 

g(x^)  and  g^  »  g(x^  +  d^ )  j  *  1,  2,  ....  n 

b.  Compute  the  gradient  difference  in  parallel: 

(£) 

yj  *  8^  -  s(x  )  j  *  l,  2,  ....  n 

Step  U : 

( JL+1  i  ( l) 

Update  H  using  n"  rank-two  corrections.  Let  Hq  =  H  , 

<j)  e  [0,1]  and  compute; 


.(1+1}  B(*+i) 

i  '  HJ-i 


T 

-T 

^  7J 


(H 


U+i) 

J--1  - 


ylJ 


(H 


7a 


h: 

j 


(£+D 


(£+d 


+ 


T 

♦L  's 


1  *  1.2, 


n 
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where 


v 


J 


(H 


(£+1) 

j-1 


Then  set  H 


(4+1) 


(4+1) 


Step  5: 


dj  yj 


.1-1  7i 

/7FI> 

yJ  j-1 


Perform  a  line  search  in  the  direction  s  as  follows: 


min 

a 


f(x(£)  +  a  s) 


where 

S  g(xUi) 

and  set  x(£+1)  =  x'Z J  +  a  s. 

If  lf(x^  x^)  -  f(x(i)) |  <  e,  stop;  otherwise,  set  4 
Cj=dj  V  j  =  1,  2,  ...,n  and  go  to  Step  2. 


■+  4+1,  set 


It  should  be  pointed  out  that  a  fundamental  need  of  the  PQN 
method  is  the  solvability  of  the  linear  system  of  equations  shown  in 
Step  2  of  the  algorithm.  The  issue  of  solvability  will  be  analyzed  in 
the  next  section  assuming  the  function  being  minimized  is  quadratic. 
However,  a  rank  test  should  be  incorporated  into  the  linear  equation 
solver  to  test  for  solvability  at  each  step  of  the  iteration. 

3. 1.2. 2  Properties  and  Convergence  of  the  PQN  Method 

In  this  section,  an  analysis  of  the  PQN  method  will  be  con¬ 
ducted  to  demonstrate  the  properties  of  this  algorithm  and  show  that 
the  algorithm  will  converge  in  only  one  iteration  to  the  minimum  of  a 
quadratic  function.  If  the  reader  is  not  particularly  interested  in 
the  mathematical  details  of  the  convergence  proof  presented  in  this 
section,  but  is  more  interested  in  the  performance  of  the  PQN  algo¬ 
rithm,  he  should  move  on  to  Section  3.1.3  since  the  rest  of  this  report 


may  be  read  without  an  understanding  of  the  following  analysis. 

To  begin  our  study  of  the  PQN  method,  the  following  defini¬ 
tions  are  in  order: 

Definition :  A  function  f:  R11  R^"  is  said  to  be  quadratic  if  f  is  of 

the  form: 

f(x)  =^xTAx+bTx+c 

where  the  A  matrix -is  positive  definite  symmetric  (pds). 


dl  A  dk  =  ° 


Definition:  Let  A  be  pds .  Then  a  finite  set  of  vectors  d_  ,  d  ,  ....  d 

1  2  n 

is  said  to  be  mutually  conjugate  if 

V  i  f  k. 

At  this  time,  a  number  of  propositions  will  be  stated  and 
proved  which  summarize  the  properties  of  the  PQN  method. 
Ultimately,  these  results  will  be  used  to  prove  convergence 
of  the  PQN  method. 

Proposition  3.1:  Let  f:Rn  -►  R1  be  quadratic  and  b  ,  J  *  1,  2,  ...,  n 

J 

be  an  arbitrary  vector.  If  x  =  x  +  b  ,  and  y  - 

J  J  J 

g(Xj)  -  g(x),  then  y^  =  A  b^ . 

T  T 

Proof:  Since  f(x)  =  h  x  Ax  +  b  x  +  c ,  we  have 


y^  =  A(x  +  b^)  -  Ax  *  Ab  ,  V  J  =  1,  2,  ...,  n 

At  this  point,  it  will  be  shown  that  the  direction  vectors 
generated  according  to  Step  2  of  the  PQN  method  form  a  set 
of  mutually  conjugate  directions. 

Proposition  3.2:  Let  f:  Rn  -*-R^  be  quadratic  and  suppose 

E  =  (01,  02,  ....  an)  =  c  In;  c  >  0. 
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By  the  solvability  hypothesis,  however,  the  c^'s  can  be  found 
to  satisfy  eqn.  (3. 1.2. 2-2).  But  this  implies  that 

d%dk  =  0  Vi#k 

Hence,  the  direction  vectors  generated  during  the  first  itera¬ 
tion  of  the  PQN  method  are  mutually  conjugate. 

Now  consider  all  other  iterations. 

In  Step  5  of  the  PQN  method,  if  j  *  1,  2,  ....  n, 

i.e.,  the  basis  vectors  are  set  to  the  most  recent  set  of 
mutually  conjugate  directions.  Hence,  y  a  A  d,  for  all  remain¬ 
ing  iterations.  Now  let  d  denote  the  updated  value  of  d.  _hen 


i-1 


k-1 


(dj  + 


J-l 


'ij 


V 


(a**2 


'k  i 


=  0 


4-1 


V  i  «  2,  3,  . ..,  n  and  k  =  1,  2,  ...»  i-1 

since  the  d's  are  mutually  conjugate. 

J 

The  next  result  shows  that  if  the  function  being  minimized  is 
quadratic,  then  the  linear  system  shown  in  Step  2  of  the  PQN  method  will 

be  solvable  for  all  iterations  provided  it  is  solvable  on  the  first  iter¬ 
ation.  ,  To  show  this  and  other  results,  the  following  assumption  is  needed. 

Assumption  A. 1:  Henceforth  in  this  section  we  will  assume  that  the  algo¬ 

rithm  is  solvable  on  the  first  iteration. 


60 


Prooosition  3. 3'-  If  f:  Rn  -*■  is  quadratic  and  {d.}n  ,  is  a  set  of 

— —  —  i  i=l 

mutually  conjugate  directions  generated  according  to  Step  2  of 
the  PQN  method,  then  after  one  iteration  of  the  PQN  method  the 
coefficient  matrix 


becomes  a  positive  definite  diagonal  matrix  for  all  other 
iterations . 

Proof:  Using  the  result  of  Proposition  3.1,  and  the  fact  that  the  d's 
are  mutually  conjugate,  we  have 

yI  d0  *  di  A  dj  -  0  “  1  *  i  0S.2.2.2-M 

Also,  it  should  be  clear  that 
T  T 

y.  d.  =  d.  A  d.  >  0  since 
1111 

A  is  positive  definite  symmetric.  Since  the  off  diagonal  terms 
of  ^  are  zero  and  the  diagonal  terms  are  positive,  ^  is 
clearly  a  positive  definite  diagonal  matrix  after  the  initial 
iteration. 

The  next  issue  to  be  considered  is  positive  definiteness  of 
the  update.  That  is,  if  we  initialize  the  PQN  method  with  a  pds  approx¬ 
imation  to  the  inverse  Hessian,  can  it  be  guaranteed  that  the  updated 
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inverse  Hessian  is  pds?  The  answer  to  this  question  is  resolved  by 
Proposition  3.U  below. 


Proposition  3-k:  If  is  positive  definite  symmetric  with  4  >_  0,  then 
Mi 


Proof : 


j 


H7'~  is  positive  definite  symmetric 

>0  V  j  =  1,  2,  . .. ,  n 
The  symmetric  property  is  obvious  from  the  form  of  the  update 
rule  below: 


T 

**  Vj 


U+D 


,U+1)- 


U+D  nuri)  ,  d3  S  r)  («- 


j-i 


.t 

dJ  yJ 


t  rn+T t 

ys  HJ-1 


l£ 


where 


_  Tt  _u+i)  h 

'  LyJ  HJ-i  yjJ 


(3. 1.2. 2-5) 


li 


, T 

yJ 


u(4+l) 


yJ  Hi-1 


To  show  positive  definiteness,  the  result  is  proved  for  $  ■  0 
and  then  for  4  >  0.  By  direct  computation,  it  is  easy  to  show  that 
when  4  =  05 


,(4+1) 


H(4+l) 

h<-l 
0  x 


(x1 


X  + 


d^ 


dJ  yJ 


HU+1) 

(4+1) 


r  4.  [-.,(£+1)-^  ,  v  ru(4+l)-n^5 

Let  a  =*  [Hj_!  J  x  and  b  *  J  • 

Substituting  these  quantities  into  eqn.  (3. 1.2. 2-6), 


(3- 1.2. 2-6) 


we  have 


T 

x 


(4+1) 


(aT  a)  (bT  b) 


,  T  vs2 

.(*  J>L 


T 

bb 


,  T  .  <2 

U  V 

,T 

"j  yJ 


(3.1.2.2-T) 
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T  ^ 

What  we  must  show  is  that  x  H  ^  ;x  >  0  V  x  4  0.  The  first 
term  in  eon- (3. 1*2. 2-7)  can  be  shown  to  be  positive  semi-definite 
using  the  Schwartz  inequality  [36]  as  follows: 

(aT  a)  bT  b  -  (aT  b)2  >  0 

fp  rp  pn  r\ 

l-»  »)  l»*-b)  -  (a-  t)g 

iT  b 


Also,  it  is  clear  that: 


,  T  >2 

X  T 

i  —  >  o  a:  y .  >  o 

d  v  J  J 

J 


Thus ,  when  <J>  =  0 ,  we  have  shown  that : 

T  f  f.+l') 

X  j  ^  x  -  o  ***  0 

To  show  strict  inequality  we  must  show  that : 


(aT  a)  (bT  b)  -  (aT  b)2 
bTb 


and 


,  T  .  ,2 

(x  V 

.T 

dj  y: 


do  not  vanish  simultaneously.  Note  that 

(aT  a)  (bT  b)  -  (aT  b)2 
bTb 


only  if  a  and  b  are  colinear. 
linear,  i.e.. 


Eut  this  implies -that  x  and  y  are 


x  =  B  y^  VMO. 


In  this  case,  however, 


since 


T  T  T  T 

*  dJ  *  dJ  x  ■  dj  6  yJ  *  B  d3  yJ  *  0 


dI  >  0  “  J  *  1.  2 . n 
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Therefore,  both 


(aT  a)  (bT  b)  -  (aT  b)2 


,  T  .  >2 
(x  d, ) 


can't  vanish  simultaneously,  Hence, 


XT  g(^+l)  X  >  0  V  X  /  0. 

J 

T 

Nov  suppose  <t>  >  0.  In  this  case,  the  matrix  $  v^  is  at  least  pos¬ 
itive  semi definite.  Since  the  update  given  by  eqn. (3. 1.2. 2-5)  consists 
of  the  sum  of  a  positive  definite  matrix  and  at  least  a  positive  semi- 
definite  matrix,  the  update  is  positive  definite. 

Corollary:  If  Hv  '  is  pds  and  ^  0,  then  H  is  pds 

■^•^>0  *J*1.  a . » 

Proof:  Since  H^+1^  is  obtained  from  a  finite  sum  of  pds  matrices, 

EU*1)  ig  pds. 

The  next  result  shows  that  the  set  of  mutually  conjugate 
directions  generated  by  the  PQN  algorithm  are  also  linearly  independent. 


rooositi< 


h  Let  E  =  (<j  02,  ...,  a^)  *  c  1^;  c  >  0.  If 


f(x):  Rn  R1  is  quadratic  with  d1  =  0^  and  Assumption  A.l  holds, 

then  for  the  PQU  algorithm  the  set  of  vectors  cL,  ,  d. ,•  ...,  d 

12  n 

are  linearly  independent. 


Proof:  Suppose  there  exist  i  =  1,  2,  ...»  n  such  that 

on  d.  +  ...  a  d  *  0. 

II  n  .n 

Then  a,  d?  A  d,  +  . . .  +  a  dT  A  d  =  a .  d?  A  d.,  3  0 ,  in  view 

III  niniii 

of  the  fact  that  the  d’s  are  mutually  conjugate  by  proposition  3.2. 
T 

But  since  4.  Ad.  >0  due  to  the  positive  definiteness  of  A,  a 
i.i  i 

must  be  zero.  But  this  is  precisely  what  is  required  for  the 
d's  to  be  linearly  independent. 


The  next  two  propositions  are  particularly  useful  in  proving 
the  convergence  of  the  PQN  method. 

Proposition  3.6:  Let  $  _>  0  and  'b«  given  ty  eqn- (3.1. 2.2-5). 

Then  H^£+l)  =  dj  V  J  =  1,  2,  ...,  n 

Proof:  3y  direct  computation. 


J-1  yJ  dT  V 

J  *J 


(£+l)  T  h(£+1)  T 
.  J-l  yJ  yl.  3-1 _ h. 


'3  J-l 


j 

(£+1) 


+  0  VJ  VJ  yJ 


=  4j  *  *  vj  vj  yj 


■  dj  *vj  vj  yj 1 0 


V  j  =  1,  2,  .  .  . ,  n 


Thus,  the  proposition  will  he  established  provided  we  can 
T 

show  $Vj  v^  y^  a  0.  If  $  *  0,  the  result  is  trivially  true. 
Therefore,  suppose  $  >  0.  Then 


VJ  VJ  7J  * 


T  _U+1) 
yJ  HJ-i  yJ 


y 


d~  y 

J  yJ 


■ :T 
dJ  yJ 


yT  HU+1)  y 

y3  J  yJ 


HU+1)  y  “ 

Lbi _ Li _ 

yT  HU+1)  y 
yJ  J-l  yJ 


± 


.T 

dJ  yJ 


,  T  U*l) 
4J  yJ  “j-l 


H(U1)  v  a  aT 
yJ  J-i  yJ  J  J  . 

,T 

_  "J  7J 

HU+1)  y  yTHU+1)y 


-  HU+1)  y  dT 

J-l  yJ  J 


J 


T  (W 
yJ  J-l  yj 
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After  some  manipulat ton ,  it  is  easy  to  shov  that: 


* yi  vj  yj 


.T 

dJ  yJ 


T  „U+1)  .  .  T  ( A+l) 

7  3  Hj-i  7i  dJ  "  di7i  HJ  yJ 


HU+1)  .T 

-  Vi  yj  dj  yj 


Proposition  3-7:  If  f(x):  R*1  -*■  R^  is  quadratic,  <$>  >_  0,  3  v 
A_1v  =»  H^1^,  and 

,  .T  ,_(U1)  \  /b(£+1)  xl 

B «  hu+d  +  i _ y.i}^Hi-i  _yi} 


y  y  ) 

3  7 3  7'3  v  J-l  7  3 


+  $  Vj  4^.  dj  y^  >  0  J  *  0 ,  1 , 


• »  n 


where 


■  k  3 


,  wt**D  „ 

du_.  Jj4-4a. 


,T  T  _U+1) 

dJ  yj  7J  HJ-1  yJ 


then 


A  v  =  B  v 

Proof:  Since  f(x)  is  quadratic,  from  Proposition  3.1  we  hare: 


A  1  yj  =  dj 


V  J  »  1,  n  • 


By  hypothesis , 


(B  -  A-1)  w  = 


T 

liA  . 

-T 

dJ  yJ 


(h^1}  y,)  (h^1}  y,  )T 
T  _(1+1) 

yj  Vl  yJ 


*  *  "j  Tj 


(3. 1.2. 2-8) 
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Also,  the  assumption  that 

-1  „U+1)  „  ^  .-1  «U+D 

A  v  -  Hj  ^  w  ^ 

Substituting  =  A-1  y^  and  ^  *  A_1  int0  e<ln-  ( 3.1.2. 2-8) 

leads  to: 

(B  -  A-1)  v  =  0 
■=>  A_1  v  *  B  v 

Corollary  1:  If  w  "  A-1  w  for  some  w  e  ^  and  f (x):  Rn  -*•  Rx  is 

^  .ASL+1)  .-1 

quadratic,  then  w  =  A  v. 

Corollary  2:  (Fundamental  Property  of  H) 

If  f(x):  £n  -*■  is  quadratic,  then 

H^+1)  yk  =  A_1  yk  =  \  ^  k  <  J  =  1,  2,  .  . .  ,  n 

The  proof  of  Corollary  1  is  obvious  from  Proposition  3.7  when 

3  =  E^+1'.  However,  the  proof  of  Corollary  2  is  more  subtle.  To 

J 

prove  Corollary  2,  ve  shall  use  mathematical  induction.  Note  that 


since  f(x)  is  quadratic,  ve  may  invoke  Corollary  1  with  v  =  y^  and  Prop¬ 
osition  3-1  to  obtain 

(£+1)  _l 

Hj  ‘  yk  =  A  yk  =  \  for  3X17  k  and  J  "  1>  2>  •  •  •  »  n 
However  from  Proposition  3,6,  we  have 

'lr  *,  ■  h 

„  ,  *  '  BU+1) 

Nov  let  us  assume  H,  ,  y 


”  i  =  1,  2,  n.. 


J-l  J 


d^  V  k  <_  J-l.  Also,  by  Proposition  3.6 


Hj*+1)  7k  =  S.  for  k  2-  j  *  1,  2,  . . . ,  n. 


However,  using  Corollary  1  of  Proposition  3.6  the  fact  that  d^  =  A-1  y. 


and  the  inductive  hypothesis,  ve  have 


H 


(£+1) 


-1 


4  =  A  \  V  k  <  J  *  1,  2,  ....  n 

At  this  time,  ve  are  in  a  position  to  prove  two  very  important 


convergence  theorems.  The  first  result  shows  that  the  PQN  method  con¬ 


verges  exactly  to  the  inverse  Hessian  of  a  quadratic  function  by 


performing  Steps  1,  2,  3,  and  U,  while  the  second  result  indicates  that 
the  PQM  method  will  minimize  a  quadratic  function  in  only  one  iteration 
(Steps  1,  2,  3,  U,  and  5). 


n  i 

Theorem  3.1:  If  f  (x) :  k  -*•  *  is  quadratic  and  Assumption  A.l 

( •]  \  -i 

then  HVX/  =  A  . 


Proof:  Let  x,  z  e  R  and  suppose  x  =  Az,  Then 

since  f(x)  is  quadratic  with  A  psd,  A"1  exist  so  that 

(3.1. 


.-1 

z  =  A  x 


From  Proposition  3.5,  the  d^'s  j  =  1,  2,  ...»  n  are 
independent  so  they  form  a  basis  in  R  Q.  Hence  >3  3. 


■2 


z  =  >  6j  dj 
j-i 


Since  f  is  quadratic,  we  may  write: 


X“A3  =  I  wl  8jy 


j  J  Z,  i 

J=i  j=i 


From  the  fundamental  property  of  H,  we  have 

=  J  =  1' 2-  ••••- 

But  by  definition,  .  Therefore, 

n 


H<1)  =“2  8:  a‘1!  yj  -  2  bjV*  (3‘1- 

J-l  J=1 

using  the  fact  that  y  *  d  .  But  eqns.  (3-1. 2. 2-9)  and  (3 

J  w 

imply  that 

x  =  z  =  A-1  x 


Hence,  H(l)  =  A-1. 


holds , 


2.2-9) 

linearly 


2.2-10) 

.1.2.2-10) 
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Theorem  q.  ?:  If  f(x):  R3  -*■  P"  is  quadratic,  then  the  PQN  algorithm  vill 
converge  to  the  location  of  the  minimum  of  f(x)  in  one  itera¬ 
tion  provided  Assumption  A.l  holds. 


rooi 


T  T 

Let  f(x)  =  x*  Ax  +  b*  x  +  c .  From  Theorem  3-1,  it  is  clear 
that  after  performing  Steps  1-L  of  the  PQ3  method, 

M  \  i 

H  X '  =  A'\ 

Using  this  result  in  Step  L  of  the  PQN  method,  results  in  a 

single  line  search  of  the  form: 

min  f  (x^  +  ^  s ) 
a.  0 


where 


.  *  -*(1) 


-  -rW11  *  bi  =  -*(1)  - 


Hence,  aQ  must  be  found  to  minimize 

f(x(l)  -  x(l)  -  aQ  A*1  b) 


[3.1.2.2-11) 


Since  f(x)  is  quadratic,  the  minimum  of  f(x)  is  located  at 

x  =  -A-1!).  Clearly,  aQ  =  1  minimizes  eon. (3.1.2.2-11)  and 

(2)  -L 

the  updated  solution  is  x  =»  -A  d.  Hence  the  procedure 
converges  in  one  iteration. 

The  analysis  presented  in  this  section  indicates  that  one  of 
the  major  attributes  of  the  PQH  algorithm  is  that  convergence  will  re¬ 
sult  after  one  iteration  when  the  function  being  minimized  is  quadratic. 
This  is  significant  because  most  highly  efficient  serial  procedures 
(such  as  the  DFP  method)  can  require  at  most  n  iterations  to  converge 
in  such  cases. 

It  should  be  noted  that  the  convergence  results  derived  in 
this  section  assume  that  the  function  being  minimized  is  quadratic. 

For  nonquadratic  functions,  however,  the  convergence  properties  of  the 
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PQK  algorithm  will  be  demonstrated  in  the  usual  way  by  testing  this 


new  algorithm  on  a  set  of  standard  test  functions.  This  aspect  is  con¬ 
sidered  in  the  next  section. 

3.1.3  Test  Function  Performance 

When  a  new  minimization  algorithm,  such  as  the  PQN  method 
discussed  in  the  previous  section,  is  developed,  it  is  a  common  prac¬ 
tice  to  compare  it  with  existing  methods  using  a  standard  set  of  test 
functions.  Some  of  the  most  common  test  functions  used  by  researchers 
in  this  area  includes  the  quadratic  function,  Rosenbrock's  function, 
Powell's  function,  Wood's  function,  and  the  Helical  Valley  function 
[26,  3^].  These  functions  and  their  properties  are  summarized  below: 

•  Quadratic  Function 

2  2  2 

f(xL,  x2,  X3)  =  x1  +  2x2  +  5x^  -  2xx  x2 

Exact  Solution:  (0,0,0) 

Starting  Approximation:  (1,1,1) 

This  function  is  rather  easy  to  minimize  and  is  included  to 
verify  the  finite  step  convergence  property  of  quasi-Newton  methods. 

•  Rosenbrock's  Function 

f(x1,  x2)  =  100  (x2  -  xx2)2  +  (1  -  x1)2 
Exact  Solution:  (1  ,1) 

Starting  Approximation:  (-1.2  ,  l) 

Rosenbrock's  function  is  particularly  difficult  to  minimize 

2 

since  the  minimization  must,  travel  along  the  parabolic  valley  y  *  x  . 
*>  Powell's  Function 

2  2 

f(x1,  x2,  x^,  x^ )  =  (x1  +  10x2)  +  5(x,  -  x^) 

+  (x2  -  2X2)**  +  10(x1  -  Xjjk 
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Exact  Solution:  (0  ,  0  ,  0  ,  0) 

Starting  Approximation:  (3  ,  -1  ,  0  ,  1) 

This  function  is  difficult  for  a  variable  metric  algorithm 
to  minimize  because  at  the  minimum  the  Hessian  is  singular. 

•  Wood's  Function 

f(xx,  x2,  x^,  xu)  =  100(x2  -  xL2)2  +  (1  -  Xx)2  +  90(xu  -  x^2)2 
+  (1  -  x3)2  +  10.1  [(x2  -  l)2  +  (xu  -  l)2] 

+  19-8(x2  -  1)  (x^  -  1) 

Exact  Solution:  (1  ,  1  ,  1  ,  1) 

Starting  Approximation:  (-3  *  -1  ,  -3  »  -1) 

This  function  is  difficult  to  minimize  because  the  quadratics 

P  2 

x,  ~  x2  311(1  X3  "  form  a  set  of  level  curves  which  are  banana 
shaped. 

•  Helical  Valley 

f(xL,  x2,  x3)  *  100  C(x3  -  100)2  +  ( /  x  2  +  x  2  -  l)2]  +  x2 


where 


2  J  0  = 


tan  1  Xg/x^ 
tt  +  tan"~  X2^X1 


for  x.,  >_  0 


for  x,  <  0 


Exact  Solution:  (1,0,0) 

Starting  Approximation:  (-1  ,0  ,0) 

This  function  is  rather  difficult  to  minimize  because  the 
minimum  is  located  at  the  bottom  of  a  helical  valley. 

The  tes4-  ‘'unctions  described  above  were  used  to  study  the  con- 
crcpert .  :s  of  the  PQN  algorithm.  In  particular,  the  PDFP 
-‘■•  vis  were  employed  to  minimize  these  test  functions.  Also, 
*  ..  method,  as  well  as  the  PVM  method,  were 


employed  to  minimize  the  test  functions  previously  described.  The  re¬ 
sulting  performance  of  each  method  is  summarized  in  Tables  3- 1-3.5 • 

The  results  indicate: 

•  Quadratic  Function 

In  this  case,  the  parallel  algorithms  converge  in  only  one 
iteration  while  the  serial  methods  converged  after  three  iterations. 
Note  that  these  results  are  consistent  with  theoretical  results  which 
indicate  that  the  PVM,  PDFP  and  PBFS  methods  must  converge  in  one 
iteration  (see  Table  3.1). 

•  Rosenbrock 1 s  Function 

For  this  function,  the  PVM  and  P3FS  algorithms  converge  sig¬ 
nificantly  faster  than  the  serial  methods  but  the  PDFP  method 
required  more  iterations  to  converge  than  the  serial  DFP  method 
(see  Table  3-2).  Since  each  gradient  evaluation  requires  approxi¬ 
mately  the  same  time  as  two  function  evaluations,  in  this  case, 
the  equivalent  number  of  function  evaluations  required  by  each 
method  is : 


26  x 

2  +  63  =  115 

for  PDFP 

1C7  + 

12  =  IU9 

for  DFP 

3ecause  the  PDF?  method  requires  fewer  equivalent  function 
evaluations  compared  to  the  serial  DFP  method,  the  PDFP  method  will 
actually  converge  faster  than  the  DFP  method  even  though  more  iter¬ 
ations  are  required. 

•  Helical  Valley 

From  the  results  shown  in  Table  3-3,  the  parallel  methods 
require  fewer  iterations  to  converge  than  the  serial  methods.  Note 
than  the  PVM  methods  converged  the  fastest  in  this  case. 
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rAHlii  3.1:  Minimization  Algorithn  Peilonnance:  Quadratic  Function 


TABI£  3-2:  Minimum-lion  Al^orlthn  PerPonnanoe:  Rosenbpock's  Function 


TABIJi  3  .V*  Min  bn  l /at  ion  Algorithm  Performance:  Helical  Valley 


•  Wood's  Function 


In  this  case,  the  parallel  algorithms  once  again  converged 
mere  rapidly  than  the  serial  methods.  In  fact k  the  PVM  method  is 
nearly  50%  faster  than  the  serial  DFP  method.  Also,  note  that  the 
PBFS  method  is  competitive  with  the  PVM  method  this  time  (see  Table 

3.4). 

•  Powell’s  Function 

As  seen  from  Table  3-5,  each  of  the  parallel  minimization 
procedures  converged  more  rapidly  than  the  serial  methods.  All  of 
the  parallel  methods  performed  equally  well  on  this  test  function. 

In  summary ,  the  results  indicate  that  without  question  the 
parallel  minimization  procedures  converge  more  rapidly  than  serial 

methods .  Also ,  it  appears  that  the  PBFS  method  is  preferable  to  the 
PDFP  procedure. 

Before  a  recommendation  can  be  made  as  to  which  parallel 
algorithm  should  be  generally  used,  a  robustness  study  should  be  con¬ 
ducted.  The  term  robustness  used  here  is  a  measure  of  the  relative 
insensitivity  of  a  parallel  algorithm  to  the  magnitude  of  |  jo  |  | 

*  *  1,  2,  ...,  n.  The  issue  of  robustness  will  be  addressed  by  vary¬ 
ing  the  weighting  parameter,  c,  associated  with  the  set  of  linearly 
independent  vector 

Z  *  (a  ,  a  ,  . . . ,  a  )  *  c  I  ;  c  >  0 
id  n  n 

required  by  the  PVM,  PDFP,  and  PBFS  algorithms.  In  particular,  the 
robustness  of  these  algorithms  is  demonstrated  in  Figures  3. 0-3. 3, 
which  were  obtained  by  solving  the  set  of  standard  test  functions  de¬ 
scribed  earlier  with  10~^  <  c  <  10-^. 
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TABLE  3*^;  Minimization  Algorithm  Perfor 


TABLE  3-5:  Minimization  Algorithm  Performance:  Powell's  Function 


The  results  indicate: 

•  For  Rosenbrock's  function,  the  PDF?  and  PBFS  algorithms  are 
more  robust  than  the  PVM  algorithm,  although  the  PVM  algorithm 
may  require  fewer  iterations  to  converge  (see  Figure  31.). 

•  For  the  Helical  Valley  function,  the  PBFS  and  PVM  algorithms  are 
more  robust  than  the  PDFP  algorithm.  Again  the  PVM  algorithm 
converges  more  rapidly  than  the  other  methods  (see  Figure  3.2). 

•  For  Wood's  function,  the  PBFS  algorithm  possesses  the  highest  degree 
of  robustness.  Note  that  for  the  PVM  algorithm  the  total  number 

of  iterations  required  for  convergence  increases  very  rapidly  if  c 
is  chosen  too  large.  Also  observe  that  the  PDFP  algorithm  is  more 
robust  than  the  PVM  algorithm  even  though  more  iterations  are  re¬ 
quired  for  convergence  (see  Figure  3.3). 

•  For  Powell's  function,  the  PDFP  method  is  the  most  robust,  although 
the  parallel  minimization  procedures  all  require  approximately  the 
seme  number  of  iterations  to  converge  over  a  wide  range  of  c; 

lCf9  £  c  £  10-5  (see  Figure  3.^). 

In  summary,  the  robustness  study  conducted  here  indicates 

that  the  parallel  rank-two  quasi-Newton  methods  (PDF?,  PBFS)  generally 
are  more  robust  than  the  rank -one  PVM  algorithm.  The  results  also 
indicate  that  the  PBFS  algorithm  might  be  preferable  to  the  PDFP  method. 
Although  the  PVM  algorithm  generally  required  fever  iterations  to  con¬ 
verge,  the  PBFS  algorithm  might  be  preferred  in  view  of  its  superior 
robustness  characteristics.  Finally,  the  results  obtained  clearly 
show  that  parallel  rank-two  methods  are  more  robust  than  parallel 
rank-one  methods  which  was  one  of  the  major  motivations  for  developing 
the  PQN  method  presented  in  Section  3.1. 


FIGURE  3.1:  Robustness  of  Parallel  Minimization  Algorithms:  Rosenbrock ' s  Function 


FIGURE  3.3:  Robustness  of  Parallel  Minimization  Algorithms:  Wood's  Function 


3 . 2  Parallel  Methods  for  Integrating  Ordinary  Differential  Equations 
In  this  section,  parallel  methods  for  integrating  ordinary 
differential  equations  are  discussed  and  compared.  In  particular, 

Downs'  method  [ 37]  and  the  Miranker-Liniger  method  [38]  are  discussed 
in  Section  3-2.1.  In  Section  3-2.2,  ohe  Miranker-Liniger  method  is 
extended  to  allow  the  integration  step  size  to  tie  automatically  adjusted 
so  as  to  maintain  a  desired  level  of  accuracy.  To  more  fully  appre¬ 
ciate  the  speed  and  accuracy  of  the  parallel  variable  step  size  inte¬ 
gration  method,  it  is  compared  with  existing  integration  methods  in 
Section  3-2. 3- 

3.2.1  A  Survey  of  Parallel  Integration  Algorithms 

3efore  discussing  parallel  procedures  for  solving  initial- 

value  problems ,  let  us  first  define  the  underlying  problem  and  briefly 
mention  some  possible  approaches  to  its  solution.  Therefore,  consider 

the  initial-value  problem: 

y(t)  =  f[y(t),  t]  t  >  t  (3.2. 1-1) 

o 

y(to)  *  yo 

where  the  initial  time,  t  ,  and  the  initial  condition,  y  ,  are  assumed 

o  o 

to  be  known. 

It  is  assumed  that  f:  R0  -*■  is  continuous  and  differen¬ 
tiable.  The  exact  solution  to  this  problem  is  only  known  for  special 
choices  of  the  function  f[y(t),  t].  In  general,  however,  the  right- 
hand  side  (BHS)  of  eqn. (3. 2.1-1)  is  so  complex  that  only  approximate 
solutions  may  be  found. 

At  the  present  time,  many  numerical  procedures  have  been 
proposed  to  solve  initial-value  problems.  Some  of  these  methods 
include:  Euler's  method,  Runge-Kutta  methods,  and  predictor-corrector 
methods  [  22] , 
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Currently,  the  fourth-order  Adam's  predictor  corrector  and 
the  Runge-Kutta-Fehlberg  methods  are  among  the  most  efficient  proce¬ 
dures  for  solving  initial-value  problems  [39].  These  methods,  hovever, 
are  sequential  in  nature  and  as  such  are  not  suitable  for  parallel 
computers . 

Although  many  parallel  computers  exist  at  the  present  time, 
only  tvo  parallel  methods  for  solving  initial-value  problems  currently 
exist.  These  methods,  due  to  Downs  [  37]  and  Miranker  &  Liniger  [38], 
surprisingly  were  developed  nearly  a  decade  ago.  Apparently,  this 
area  of  research  may  be  reconsidered  in  the  near  future,  but,  for  now, 
let  us  discuss  Downs'  method. 

Downs  1  Method 

One  of  the  first  parallel  methods  for  numerically  solving 
an  initial- value  problem  was  reported  by  Downs  in  reference  [37].  This 
method  was  originally  designed  for  use  on  the  Illiac  IV  although  it  can 
be  executed  on  any  parallel  computer  with  N  processors  which  are  cap¬ 
able  of  operating  simultaneously. 

To  begin  our  discussion  of  this  method,  let  tt^  =  [t^,  t^, 

....  t  ,  ,  t  1  be  a  time  partition  of  the  interval  [t„,  t.].  Associated 
with  is  a  sequence  of  functions  which  will  be  denoted  by 
y  4  y2('fc)i  •••>  yN+1(t)) 

Basically,  the  approach  taken  by  Downs  is  to  construct  a  sequence 
k  N+l 

(y  (t))  in  a  recursive  manner  such  that  in  the  limit,  the  sequence 

*L**“.X 

approaches  the  exact  solution  of  the  initial- value  problem  under  con¬ 
sideration. 

In  reference  [37],  Downs  gives  tvo  methods  for  computing 
the  recursion  on  a  parallel  computer.  The  first  method  is  based  upon 
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computing  a  partial  sub  in  2  log^  N  steps.  Downs  indicates  that 
although  this  method  is  quite  efficient  on  a  parallel  processor,  such 
as  the  Illiac  IV,  convergence  can  be  slow.  The  second  method  pro¬ 
posed  by  Downs  requires  more  complicated  computations  but  usually 
leads  to  much  better  convergence.  This  technique  is  based  on  a  first- 
order  Taylor  series  which  also  requires  only  2  logg  N  steps  to  execute. 
Downs  shows  that  his  procedure  converges  linearly  to  the  exact  solu¬ 
tion  of  an  initial- value  problem  provided  that  the  initial  approxima¬ 
tion  to  the  solution  is  sufficiently  good. 

The  major  problem  with  Downs'  method  is  that  the  number  of 
processors  needed  to  implement  his  procedures  may  indeed  become  pro¬ 
hibitive.  This  is  especially  true  if  the  RHS  of  the  initial-value 
problem  is  highly  nonlinear  since,  in  this  case,  the  number  of  parti¬ 
tion  points  (or  processors)  associated  with  the  time  partition  w 

N 

must  be  relatively  large  to  ensure  accuracy.  This,  along  with  the  fact 
that  Downs  does  not  present  an  example  illustrating  the  performance 
of  his  procedure,  may  cause  one  to  be  reluctant  to  use  his  method. 


Miranker  and  Liniger 1 s  Method 

Miranker  and  Liniger ' s  class  of  parallel  predictor-corrector 
integration  methods  is  based  upon  decoupling  the  predictor-corrector 
equations  such  that  the  calculations  required  by  the  predictor  and 
corrector  can  be  performed  simultaneously  on  separate  processors  [  38], 
This  may  be  achieved  by  forcing  the  corrector  to  lag  the  predictor  by 
one  time  step.  In  fact,  Miranker  and  Liniger  have  shown  that  if 

ti  *  t0  +  ib  i  =  1,  2,  ... 

where  h  is  an  integration  step  size  parameter  and  y^,  y^,  y^,  f^,  and 
f?  are  denoted  as  the  value  of  y(t^),  the  predicted  value  of  y(t^), 


r 
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the  corrected  value  of  yU^,  the  value  of  fty^,  t^J  ,  and  the  value 
f!y^»  t,  ]  respectively ,  then  the  following  predictor-corrector 
pairs  may  be  derived: 


Parallel  Trapezoidal  Rule: 


*?«.  -  *:  •  "  2h  f 


c 

ri-l 


y-  =  y +  (h/2)  (fj  +  f*_L) 

Parallel  Adams-Moulton  (3rd  order): 

*i.i  ■  *1.1  *  (»/3)  <t  ff  -  2  *  f'_2) 

y-  -  y i.i  +  (11/12)  ( 5  fj  +  8  f^_1  -  rj_2) 


(3.2.1-2a) 
(3.2. l-2b ) 

(3.2.1-3a) 

(3.2.1-3b) 


Parallel  Adams-Moulton  (Uth  order): 

*f.l  *  *1-1  *  <b/3>  <8  *?  -  5  fi-l  *  k  *i-2  -  *i-3> 

(3.2.1-la) 

yi  =  yi-l  +  (h/2U)  (9  fl  +  19  -  5  fj_2  +  f=_3) 

(3.2.1-lb) 


It  is  clear  from  the  structure  of  the  parallel  predictor- 
corrector  pairs  above  that  the  predictor  and  corrector  equations  may  be 
evaluated  at  the  same  time  if  two  processors  are  available.  Also,  note 
that  the  computation  time  may  be  reduced  by  a  factor  of  two  if  one  of 
these  methods  were  used  rather  than  a  conventional  (serial)  predictor- 
corrector  method. 

MirauUcer  and  Liniger  extend  this  idea  of  parallel  operation 
on  two  processors  to  parallel  operation  on  any  even  number  of  processors. 
They  also  analyze  the  stability  and  convergence  of  their  class  of 
methods  by  studying  the  root  condition  and  local  truncation  error  asso¬ 
ciated  with  the  theory  of  classical  multistep  methods. 

Since  the  integration  step  size,  h,  is  fixed  for  all  time 
in  the  parallel  methods  above,  the  accuracy  and  efficiency  of  these 


— t  ■  ' 
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procedures  may  "be  greatly  influenced  by  the  choice  of  h.  For  example, 
if  the  step  size  is  too  large,  then  the  resultant  solution  will  be 
inaccurate.  On  the  other  hand,  if  the  step  size  is  too  small,  then 
the  efficiency  of  the  algorithm  is  reduced  since  too  many  integration 
steps  would  be  taken.  Thus,  the  problem  of  step  size  selection  is 
crucial  to  the  successful  application  of  this  method.  This  observation 
has  led  to  a  modification  of  the  basic  method  which  is  the  topic  of 
the  next  section. 

3.2.2  PPC  Integration  With  Variable  Step  Size 

In  the  previous  section,  a  number  of  parallel  predictor- 
corrector  (PPC)  methods  for  integrating  ordinary  differential  equations 
(ode's)  due  to  Miranker  and  Liniger  were  presented.  The  primary 
advantage  of  using  these  methods  over  other  procedures  is  the  speed-up 
of  computation.  Although  the  computations  required  by  a  PPC  method 
may  be  done  extremely  rapidly  on  separate  processors,  the  accuracy 
of  the  solution  may  suffer  if  the  step  size  parameter,  h,  is  not  chosen 
properly.  Thus,  it  seems  appropriate  to  modify  Miranker  and  Liniger' s 
methods  such  that  a  prescribed  level  of  accuracy  can  be  maintained 
while  keeping  the  parallel  feature  of  these  methods. 

This  modification  might  be  realized  by  using  the  predictor 
and  corrector  values  at  the  same  time  step  to  estimate  the  local 
truncation  error  and  use  this  quantity  to  vary  the  step  size  to  achieve 
a  prescribed  accuracy.  However,  since  the  corrector  lags  the  predictor 
by  one  time  step  in  the  PPC  pairs  described  in  the  previous  section, 
one  might  suspect  problems  with  this  approach.  Indeed,  this  is  true 
and  has  been  verified  through  simulation. 
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In  view  of  the  above,  it  appears  that  a  better  method  might 


be  to  use  the  predictor  at  step  i+1  and  the  corrector  at  step  i  to 
estimate  the  local  truncation  error  at  step  i  which  could  be  used  to 
automatically  vary  the  step  size. 

In  the  remainder  of  this  section,  the  ideas  discussed  above 
will  be  used  to  extend  Miranker  and  Liniger's  class  of  parallel  predic¬ 
tor-corrector  methods  such  that  a  desired  level  of  accuracy  is  main¬ 
tained.  Although  each  member  of  Miranker  and  Liniger's  class  of  PPC 
methods  can  be  extended,  the  basic  procedure  will  be  demonstrated  for 
the  parallel  4th  order  Adams-Moulton  method  given  by  eqns.  (3-2.1-4a)  and 
(3.2.1-Ub). 


It  is  well  known  that  the  local  truncation  error  associated 
with  the  Adams-Moulton  corrector  (eqn.  3.2.1-Ub)  is  given  by  [19  3: 

di.c  a  ?ioh5  y(5)(V  Ci-etvV 

(3. 2. 1-5) 

Due  to  the  form  of  eqn. ( 3 . 2 .l-4a) ,  we  must  estimate  the  local 
truncation  error  of  the  predictor.  Since  eqn.  (3-2.1-Ua)  is  accurate 
to  0(h),  we  will  assume  an  exact  solution  of  the  form: 

-5  -  -  *  (3. 2. 1-6) 


t  1*0 


y(t)  =  t' 

so  that  y'  *  f(y,  t)  =  5t‘*  »  t  >  tQ.  (3-2. 1-7) 

By  definition,  the  local  truncation  error  associated  with 
the  predictor  is  given  by: 

Vl,p  4  y"W  ‘  'i.l  (3.2. 1-8) 

Substituting  eqns. (3-2. 1-6)  and(3.2.1-7)  into  (3-2.1-4a)  and  evaluating 

(3- 2. 1-8)  at  t£_3  =  0,  t±_2  =  h,  tj^  =  2h,  t£  =  3h,  ti+1  =  4h,  the 
desired  result  is  obtained: 
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"1+1  ,p 


=  |2£  h5  y(5)  ir. 


!20 


Z2  £  [t0’  tlJ 


(3- 2. 1-9) 


Note  that  although  the  estimates  of  the  local  truncation  errors  given 
by  eqn. (3. 2. 1-5)  and  (3. 2. 1-9)  are  mathematically  correct,  they  are  of 
little  use  numerical!.;  .  This  is  due  to  the  fact  that  explicit  calcula¬ 
tion  of  y^  (5 )  is  ciearly  problem  dependent.  To  circumvent  such 
problems,  we  need  a  better  method  of  estimating  the  local  truncation 
error . 

To  obtain  such  a  method,  we  proceed  by  subtracting  eqn. 

(3. 2..  1-5)  from  (3. 2. 1-9)  assuming  5  =  52  s  C.  to  get: 

y(ti+l)  “  *1+1  +  (yi  '  y(ti)}  =  ffo  h5  y(5)  (3.2.1-10) 

Using  this  result  and  eqn. (3. 2. 1-5) ,  it  is  easy  to  show  that: 

dl,c  =  551  *  V  (3.2.1-11) 

where¬ 
by  4  y^i+1)  - 

Using  the  Schwartz  and  triangle  inequalities  [36]  on  eqn  . 
(3*2.1-11),  we  obtain  the  upper  bound  on  the  local  truncation  error 

i  “i  ,e  I  —  251*  {K  ~  yi+J  +  '5y'}  (3.2.1-12) 

This  result  is  particularly  interesting  because  it  indicates  that 

This  implies  that  to  main¬ 
tain  a  small  local  error  at  the  corrector,  the  step  size  cannot  be 


|d  |  is  directly  proportional  to  1 5 

V  ,c  y 


too  large  since  to  predict  too  far  in  advance  may  cause  |6  |  to  be 
large.  Although  this  result  is  clearly  what  is  needed,  eqn. ( 3 .2.1-12) 
is  not  very  useful  as  is  because  the  |6  |  is  unknown.  To  overcome  this 
difficulty,  we  may  write  a  Taylor  series  approximation  for  y(t)  eval¬ 
uated  at  t  =  t^  +1  as  follows: 
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y(ti+1)  =  y(ti)  +  hi  f  [y(ti) ,  tj  ]  +  0(hi2) 

where 

hi  &  ti+l  -  ti  >  °‘ 

Thus ,  we  have 

|6yl  £  *L  If  frU.),  t  ] I  +  | 0(h2) |  (3.2.1-13) 

If  we  substitute  eqn  (3-2.1-13)  into  eqn. (3-2.1-12) ,  we 
have  a  means  of  estimating  the  local  truncation  error  of  the  corrector 
numerically  as  follows : 

ldi,cl  -251  ^yi  "  yi+J  +  hi  lf(yi’  (3.2.1-lU) 

Finally,  this  result  can  be  utilized  to  automatically  vary  the  step 


size  h  .  until  a  desired  accuracy  is  obtained.  More  specifically, 
*  l* 

this  may  be  achieved  by  performing  the  following  steps: 

Step  1:  Simultaneously  evaluate  the  predictor  and  corrector  eqns. 


4n  -  y'i-1  *  r  (31i  -  !fh  *  htl- 2  -  !U] 
4  *  4-i  *  3T  *  (9fi  +  19I1-1  '  5fi-2  *  f(-3; 

Step  2:  Estimate  the  local  truncation  error: 
ai,c  ■  (|yi  •  *ht  |fii} 


(3.2.1-15a) 

(3.2.1-15b) 


Step  3: 

a.  If  e  <  d,  <  e  ,  the  step  is  accepted  so  set  i  i+1  and  go 

min  —  i,c  —  max 

to  Step  1. 

b.  If  d.  >  e  ,  then  the  local  truncation  error  is  too  large.  Therefore, 

i  max’ 

replace  h^  *■  h^/2,  restart  method  and  go  to  Step  1. 

c.  If  d.  <  e  .  ,  the  solution:  is  more  accurate  than  desired.  Therefore, 

l  min’ 

replace  h^  *■  2h^>  restart  method,  set  i  i+1,  and  go  to  Step  2. 
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The  sequence  of  computations  required  by  eans. (3. 2.1-15a)  and 


(3.2.1-lJb)  is  illustrated  in  Figure  3.5  below. 


/ 

/ 


Figure  3.5  -  The  Sequence  of  Computations  of  a 
PPC  Integration  Procedure 


The  upper  line  represents  the  progress  of  the  computation 
at  the  mesh  points  for  the  predictor  while  the  lower  line  shows  the 
progress  of  the  corrector.  The  dashed  line  is  referred  to  as  a  compu¬ 
tation  front.  The  arrows  in  Figure  3.5  indicate  that  the  computations 
at  the  mesh  points  ahead  of  the  computation  front  only  depend  on  infor¬ 
mation  behind  the  front  which  is  characteristic  of  a  parallel  integra¬ 
tion  algorithm.  The  method  can  be  implemented  by  simultaneously 


evaluating  the  following  quantities  in  separate  processors: 


Before  leaving  this  section,  a  few  words  should  be  said  about 


starting  the  parallel  variable  step  size  integration  scheme.  Note 
that  to  start  this  method  tne  following  quantities  must  be  available 
to  the  parallel  processors: 


c 


A0’ 


c 

yl’ 


and  f^ 


These  values  may  be  obtained  by  taking  four  forward  integra¬ 
tion  steps  of  a  standard  Runge-Kutta  (RK)  integration  method  since  RK 
methods  are  self-starting.  The  values  of  y(t)  and  f[y(t),  t]  computed 
over  the  first  three  time  steps  -may  be  used  by  the  corrector  while  the 
value  of  y(t)  and  f[y(t),  t]  at  the  fourth  time  step  can  be  used  to 
initialize  the  predictor.  At  this  point,  enough  information  is  avail¬ 
able  to  begin  processing  in  parallel  using  the  parallel  variable  step 
size  integration  scheme. 

3.2.3  Comparison  of  Methods 

In  order  to  determine  the  effectiveness  of  the  parallel 
integration  procedures  discussed  in  the  previous  sections ,  these  methods 
were  used  to  find  the  solution  of  the  forced  Van  der  Pol  equation  [37]: 

x  (t)  +  a(t )  (1  -  x2(t))  x(t)  -  x(t )  +  u(t)  =  0  (3. 2. 3-1) 


where 


a(t)  is  a  parameter  which  defines  a  particular  systems 
dynamics  and  u(t)  is  a  forcing  function  or  control. 


To  solve  this  problem  by  the  parallel  integration  procedures 

previously  discussed,  we  must  write  eqn  (3*2. 3-1)  as  a  system  of  first- 

order  differential  equations.  If  we  let  x^(t)  =  x(t)  and  x^(t)  =  x(t), 

then  eqn  (3. 2. 3-1)  may  be  written  as: 

x^t)  =•  x2(t)  (3.2.3-2a) 

x2(t)  =  a{ t )  (1  -  x12(t))x2(t)  -  x^t)  +  u(t)  (3-2.3-2b) 

In  the  simulations ,  the  control  was  selected  as: 

u(t)  =  sin  —  t  t  >  0 
/  2 
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since  this  input  will  cause  x(t)  to  be  uniformly  asymptotically  stable 
[40].  Also,  in  the  simulations  a(t)  was  set  to  zero  and  the  initial 
conditions  were  chosen  as : 


X1G  =  1  and  x2Q  =  1  -  / 2  . 

In  this  case,  the  exact  solution  of  eqn  13.2.3-2) 


x  (t)  =  cos  t  +  sin  t  +  2  sin  —  t 

1  ST 


is  given  by: 


t  >  0 


o  ]_ 

x-(t)  =  cos  t  -  sin  t  +  —  ccs  —  t 

2  ST  ST 

At  this  point,  Miranker  and  Liniger's  fourth-order  parallel 
integration  method  (PPC42)  and  the  parallel  variable  step  integration 
procedure  (PPCU2V)  were  used  to  obtain  a  numerical  solution  to  Van  der 
Pol's  equation  when  a(t)  =  0.  Let  us  denote  the  computed  solution  as 
x(t)  and  the  exact  solution  as  x(t). 

In  Tables  3.6  and  3.7,  x(t),  x(t),  and  the  error  x(t)  - 
x(t)  are  shown  over  a  five  second  interval.  The  results  indicate 
•  3y  using  the  PPCU2  procedure  with  a  fixed  step  size  of  h  =  5-/200  = 
0.025,  the  computed  solution  is  accurate  to  about  6  digits  (see 


Tacle  3.6)  which,  substantiates  the  claim  that  the  ?PC«2 

1. 

procedure  is  accurate  to  0(h'*). 

•  The  PPCU2V  integration  procedure  can  indeed  vary  the  step  size  to 
meet  a  5-6  digit  accuracy  requirement  imposed  by  the  user  (see 
Table  3.7).  To  obtain  the  computed  solution  shown  in  Table  3.7,  the 
PPCU2V  procedure  only  took  l4o  integration  steps  while  the  PPCU2 
procedure  required  200  integration  steps . 

The  second  example  which  was  considered  is  when  a(t)  =  1 
V  t  z  Co,  5].  In  this  case,  an  analytical  solution  to  the  Van  der  Pol 
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Table  3.6:  l>FCiJ2  Solution  of  Van  der  Pol's  Equation  when 
a(t)  =  d  V  t  e  [0,  5] 
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equation  is  impossible  to  obtain.  Nevertheless,  an  approximate  solu¬ 
tion  can  be  obtained  by  employing  the  parallel  integration  methods. 

For  this  example,  the  results  indicate  that  the  solution 
obtained  using  the  PPCU2  procedure  (h  =  0.005  and  fixed)  and  the  variable 
step  PPCk2V  method  agree  to  about  5  digits  (see  Tables  3.8  and  3.9). 

It  is  interesting  to  count  the  number  of  integration  steps 
required  by  each  procedure.  Clearly,  for  the  fixed  step  si2e  method 
5/0.005  =  1000  integration  steps  vere  needed.  For  the  PPCU2V  method, 
however,  the  number  of  integration  steps  needed  depends  largely  on  the 
behavior  of  the  solution  x(t).  Note  that  in  regions  where  x(t)  varies 
rapidly  many  integration  steps  were  required  while  relatively  few  steps 
were  taken  when  x(t)  was  nearly  constant  (see  Figure  3.6  and  Table  3.9). 
But  this  is  precisely  what  we  would  expect  a  good  variable  step  size 


method  to  do. 
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CHARTER  FOUR 


IMPLEMENTATION  considerations 

Thus  far,  this  thesis  has  primarily  been  concerned  with  the 
development  of  efficient  numerical  methods  for  solving  nonlinear 
estimation  and  control  problems  which  are  suitable  for  modern  parallel 
computers.  Ip  this  chapter,  the  implementation  of  these  methods  is 
considered.  In  particular,  a  parallel  computer  architecture  is  pro¬ 
posed  in  Section  U.l  which  utilizes  three  levels  of  parallelism  to 
allow  the  implementation  of  the  parallel  algorithms  discussed  in 
Chapters  Two  and  Three.  In  Section  U.2,  the  execution  time  of  the 
parallel  algorithms  is  estimated  and  compared  with  that  of  currently 
used  sequential  methods.  Finally,  the  speedup  due  to  parallelism  is 
estimated  using  the  timing  equations  given  in  Sections  U.2. 3. 

U.l  A  Parallel  Computer  Architecture 

Whereas  most  existing  computer  systems  (parallel  or  serial) 
have  been  designed  as  general  purpose  machines,  the  parallel 
computer  proposed  in  this  section  may  be  considered  a  special  purpose 
machine  for  implementing  the  parallel  algorithms  discussed  in  Chapters 
Two  and  Three.  The  architecture  of  the  proposed  computer  utilizes  many 
independent  processors  capable  of  operating  simultaneously  such  that 
more  processing  power  would  be  possible  than  a  single  central  proces¬ 
sing  unit  with  traditional  architecture.  Although  there  is  no  reason 
to  believe  that  the  architecture  of  the  proposed  parallel  computer  is 
an  "optimal"  implementation  of  the  parallel  algorithms  described  in 
Chapters  Two  and  Three,  it  may  be  viewed  as  a  "natural"  implementation. 
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In  view  of  the  structure  of  the  parallel  algorithms  developed 
in  Chapter  Two  and  Three  and  the  availability  of  low-cost  microproces¬ 
sor  systems  (available  on  a  single  3 -i  x  II  inch  card),  the  proposed 
parallel  computer  is  organized  into  three  levels  of  parallelism;  namely: 

•  Level  I  (Minimization  Level) 

•  Level  II  (Shooting  Level,  if  applicable) 

•  Level  III  (integration  Level) 

To  reduce  the  possibility  of  coordination  and  synchronization 
problems  with  each  level,  the  proposed  computer  should  be  synchronous 
and  utilize  a  single-instruction-multiple-data  (STMT)  stream  to  effi¬ 
ciently  implement  the  parallel  algorithms.  Other  considerations  which 
should  be  of  interest  include: 

•  Timing  requirements  for  real  time  control  computations 
(Specifying  processor  add,  multiply  and  transfer  times 
such  that  the  execution  time  is  rapid  enough  to  permit  the 
nonlinear  estimation  and  control  computations  to  be 

done  in  real  time . ) 

•  Memory  and  peripheral  requirements 

•  Effects  of  wordsize 

•  Communication  and  interconnection  among  processors 

•  Feasibility  of  implementation  and,  of  course,  cost. 

With  these  considerations  in  mind,  the  organization  of 

each  level  of  the  proposed  parallel  computer  will  now  be  described. 
k.1.1  Minimization  Level 

The  minimization  level  (Level  I)  is  the  upper  level  of  the 
architecture  in  which  a  finite-dimensional  minimization  problem,  and 
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a  search  for  the  appropriate  unknowns  (parameters  or  boundary 
conditions)  is  initiated.  If  M  is  the  number  of  unknowns  to  be 
optimized,  then  the  search  for  the  unknowns  can  be  performed  simul¬ 
taneously  by  M  optimization  modules.  Each  optimization  module  might 
consist  of  a  processing  element  (PE),  a  local  memory  (LM)  element,  ana 
an  integration  module  (IM).  Because  we  want  the  searches  to  be  per¬ 
formed  simultaneously,  the  structure  at  this  level  might  be  organized 
as  shown  in  Figure  U.l.  Notice  that  this  parallel  structure  is 
ideally  suited  for  evaluating  a  function  and  its  gradient  at  M  distinct 
points  simultaneously  which  is  precisely  what  is  required  to  implement 
the  parallel  minimization  algorithms  discussed  in  Section  3.1. 

At  this  level,  the  role  of  the  central  processor  (CP)  is  to: 

•  Initialize  each  processor 

•  Control  the  operation  of  each  processor 

•  Monitor  the  status  of  each  processor 

•  Watch  the  clock  and  controls  ;o  keep  the  processors 
synchronized  during  a  given  iteration 

As  indicated  above,  the  role  of  the  optimization  modules  is 
to  implement  the  minimization  phase  of  the  parallel  algorithms  dis¬ 
cussed  in  Chapter  Two.  Because  the  mathematical  computations  required 
by  the  parallel  minimization  algorithm  discussed  in  Section  3.1  are 
relatively  sophisticated,  the  processors  at  this  level  should  be  also. 
In  fact,  the  PE's  should  be  pipelined  so  that  the  required  vector- 
matrix  operations  can  be  performed  very  rapidly. 

Since  speed  is  a  primary  concern,  cache  memories  might  be 
applicable  for  the  local  memory  units  at  this  level. 


103 


v.  g  * 

O  3  3 

;c  -o 

2  5x0 

SCS2 

III! 

i  S  o  c» 
c  a  o  « 

oa.j£ 

cl.u.22 

oa.j- 


FIGURE  ^.1:  Parallel  Structure  at  Level  I 


As  shown  in  Figure  4.1,  the  connection  of  adjacent  proces¬ 
sors  is  not  necessary  since,  by  design,  the  parallel  algorithms  of 
Chapters  Two  and  Three  allow  nearly  all  computations  to  be  performed 
independently  of  the  others.  Note  that  this  indicates  that  relatively 
little  (if  any)  communication  is  required  among  the  processors  at 
this  level. 

Finally,  because  of  the  large  dynamic  range  of  the  computa¬ 
tions  required  by  quasi-Newton  methods ,  the  wordlength  required 
should  be  relatively  long  • 

4.1.2  Shooting  Level 

If  parallel  shooting  is  used  to  aid  the  search  for  the 
unknown  boundary  conditions,  then  the  mission  time  interval  [t  ,t^] 
would  be  divided  into  N  subintervals  using  the  partition 
to  <  tl  <....<  t„  -  tf 

The  task  of  the  processors  at  this  level  is  to  implement 
the  parallel  shooting  phase  of  the  parallel  algorithms  discussed  in 
Sections  2.1  and  2.2  by  finding  the  solution  to  the  appropriate 
initial- value  problem  over  each  subinterval  simultaneously  in  parallel. 
This  phase  of  the  algorithm  might  be  implemented'  using  the  parallel 
structure  shown  in  Figure  4.2. 

At  the  shooting  level  (Level  II),  the  role  of  each  PE  is  to: 

•  Initialize  the  integration  module 

•  Monitor  the  status  of  each  integration  module 

•  Communicate  with  the  processors  at  Level  I  and  Level  II 
to  keep  computations  synchronized. 
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FIGURE  4.2:  Parallel  Structure  at  I^vel  II 


Since  the  processors  at  this  level  are  mainly  used  for 


"bookkeeping"  and  "status  checks,"  the  PE's  need  not  be  very 
sophisticated.  Also,  the  local  memory  units  shown  in  Figure  1*.2 
might  be  relatively  small  in  view  of  the  primitive  operations  per¬ 
formed  at  this  level.  Note  that  if  parallel  shooting  is  not  used, 
then  this  level  is  not  necessary.  In  this  case,  the  integration 
module  required  at  Level  I  simply  consists  of  a  refined  integration 
module  which  is  discussed  next. 

1+.1.3  Integration  Level 

At  the  integration  level  (Level  III),  the  processors  are 
dedicated  to  the  task  of  integrating  ordinary  differential  equation 
(ode's)  over  a  subinterval  using  a  parallel  integration  scheme  such 
as  the  methods  presented  in  Section  3.2.1.  These  methods  are  sug¬ 
gested  for  the  numerical  solution  of  the  initial-value  problems 
(iVP's)  over  each  subinterval  since  computations  can  be  sped  up 
significantly  by  utilizing  more  than  one  processor  operating  in 
parallel  on  each  ode.  Tu  further  speed  computations,  a  parallel 
integration  method  could  be  used  to  integrate  each  component  of  the 
right-hand  side  (RHS)  of  an  IV?.  If  "L"  processors  are  available 
for  integrating  each  component  of  the  RHS  and  the  IVP  is  n"*'*1  order, 
then  this  phase  of  the  parallel  algorithm  might  be  implemented  as 
shown  in  Figure  k.3.  Note  that  when  L  =  2  the  structure  shown  in 
Figure  U.3  is  ideally  suited  to  implement  the  parallel  predictor- 
corrector  pairs  presented  in  Section  3.2. 

With  regard  to  processor  and  memory  requirements,  the 
processors  at  this  level  must  be  somewhat  sophisticated,  due  to  the 
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FIGURE  4.3:  Parallel  Structure  at  Level  III 
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mathematical  computations  involved  when  evaluating  the  RHS  of  ah 
I  IT .  The  processors,  however,  need  not  be  pipelined  because  the 
computations  required  by  the  parallel  integration  procedures  do 
not  warrant  it.  Nevertheless,  the  processor  add  and  multiply  time 
should  be  as  small  as  possible  since  the  integration  phase  generally 
is  the  most  time  consuming  phase  of  the  parallel  algorithms  dis¬ 
cussed  in  Chapter  Two.  .Again,  since  speed  is  crucial  here,  cache 
memories  should  be  employed  at  this  level. 

Because  the  numerical  solution  of  an  TV?  involves  knowing 
the  solution  at  many  points,  the  solution  stored  in  the  local  memo¬ 
ries  should  be  accessible  to  all  processors.  Generally,  in  situations 
like  this,  two  or  more  processors  may  attempt  to  access  the  same 
memory  module  during  a  memory  cycle.  This  phenomenon  is  called 
"memory  contention"  and  is  usually  rectified  by  providing  the  system 
with  a  "memory  lock." 

The  function  of  the  memory  lock  is  to  preclude  access  by 
other  processors  once  a  processor  has  initiated  a  memory  access. 

Since  only  one  access  can  be  made  per  memory  cycle,  one  of  the 
requests  must  wait.  Tor  the  system  tc  be  efficient,  however,  the 
wait  time  should  be  no  mere  than  or.e  or  two  memory  cycles. 

U.l.U  Coordination  of  Zacfa  level 

Nov  that  the  function  of  each  level  has  been 

discussed,  the  operation  of  -rr.tirj  svstem  v;._i  be  briefly 
described. 

Basically,  the  paral_el  r-'-ccessing  would  begin  at  Level  I 
and  proceed  to  Levels  IT  and  III  as  follows.  At  Level  I,  th-  central 
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processor  would  initialize  the  PE's  with  on  initial  approximation  of 
the  state  and  costate  variables  at  the  partition  points  of  the 
parallel  shooting  algorithm.  This  information,  along  with  the 
initial  and  final  times  associated  with  each  subinterval,  would  be 
transferred  to  Level  II  where  the  parallel  shooting  phase  of  the 
algorithm  would  be  initiated.  When  a  function  and/or  gradient 
evaluation  is  required,  the  processors  at  Level  III  would  be  activated. 

At  Level  III,  each  processor  would  simultaneously  integrate  its 
assigned  initial-value  problem  over  its  assigned  subinterval  and  use 
its  local  memory  as  -temporary  storage  for  intermediate  results.  When 
the  integration  phase  is  conjplete,  the  results  would  be  transferred 
to  global  memory  which  then  would  be  accessed  by  the  central  processor 
for  the  values  needed  to  evaluate  an  appropriate  error  function. 

Finally,  the  central  processor  would  evaluate  the  error  function  and 
decide  whether  to  continue  computations  or  halt. 

Although  it  appears  that,  while  one  group  of  processors  are 
busy  at  a  given  level,  the  remaining  processors  are  idle,  this  is  not 
the  case  because  the  idle  processors  are  really  performing  status 
checks  and  ocher  utility  functions. 

U.2  Parallel  Algorithm  Execution  Time 

The  goal  of  this  section  is  to  analytically  determine  a  set 
of  timing  equations  which  can  be  used  to  estimate  the  execution  time 
of  the  parallel  algorithms  discussed  in  Chapters  Two  and  Three.  The  tim¬ 
ing  equations  are  also  used  to  compare  the  execution  times  of  different 
algorithms,  as  well  as  to  estimate  the  speed-up  due  to  parallelism. 
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One  way  to  estimate  the  execution  time  of  the  parallel 


algorithms  would  be  to  compute  the  time  required  per  iteration  by 
taking  into  account  that  many  arithmetic  operations  would  be  per¬ 
formed  in  parallel.  This  can  be  done  by  counting  the  total  number 
of  additions,  multiplications,  function  evaluations,  and  gradient 
evaluations  for  a  given  iteration  and  multiplying  the  operation 
count  by  representative  execution  times  for  these  operations.  By 
adopting  this  approach,  one  may  estimate  the  time  required  for  con¬ 
vergence  by  using  a  serial  computer  (such  as  an  IBM  360 )  to  deter¬ 
mine  the  number  of  iterations  needed  and  multiply  this  by  the  estimated 
time  per  iteration.  Note  that  this  estimate,  however,  would  be 
problem  dependent  and  that  communication  time  between  processors  and 
memory  is  ignored. 

The  assumption  that  processor-memory  communication  time  can 
be  ignored  is  realistic  since  for  many  nonlinear  estimation  and  control 
problems  the  mathematical  computations  performed  by  each  processor 
would  be  significantly  more  time  consuming  than  memory  access  time. 

The  execution  time  of  the  parallel  algorithms  will  be  estimat¬ 
ed  by  deriving  a  set  of  timing  equations  for  the-  minimization  phase  and 
integration  phase  separately.  At  the  end  of  this  section,  the  timing 
equations  are  combined  to  provide  an  estimate  of  the  execution  time  of 
the  entire  parallel  algorithm.  At  this  time,  it  should  be  pointed  out 
that  the  timing  equations  given  in  this  section  are  derived  assuming 
the  parallel  algorithms  are  executed  on  a  parallel  computer  whose 
architecture  is  consistent  with  that  discussed  in  Section  U.l. 
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4.2.1  Minimisation  Phase 

In  ohis  section,  the  total  number  of  arithmetic  operations 
required  by  the  CM,  ?VM,  PDF?  and  P3FS  procedures  discussed  in 
Section  3.1  are  counted  assuming  many  of  these  operations  tar.  be 
performed  in  parallel.  To  form  a  basis  for  comparison,  an  operation 
count  is  also  given  for  the  serial  ZP,  DFP,  and  BFS  procedures 
assuming  these  algorithms  are  executed  sequentially.  To  derive  the 
operation  count  for  each  of  these  methods  at  this  time  would  be  very 
time  consuming  and  repetitious.  Therefore,  only  the  operation  count 
for  the  PQN  algorithm  will  be  derived  at  this  time. 

Since  step  1  of  the  PQN  algorithm  can  be  considered  an 
initialization  step,  one  iteration  of  this  method  essentially  con¬ 
sists  of  steps  2,  3,  4  and  5.  Therefore,  the  operation  count 
will  only  include  the  arithmetic  operations  required  to  per¬ 
form  steps  2-5.  The  operation  count  will  be  derived  by  counting 
the  arithmetic  operations  required  by  each  step,  one  at  a  time,  and 
combined  at  the  end  to  obtain  an  overall  operation  count. 

Starting  with  step  2  then,  observe  that  a  sequence  of  linear 
systems  of  equations  must  be  solved  during  this  step.  To  solve  each 
linear  system  as  rapidly  as  possible,  any  one  of  the  following  algo¬ 
rithms  discussed  in  references  [  14] ,  [15],  and  [4l]  may  be  employed. 
Among  these  methods,  the  procedure  reported  by  Pease  [-1]  is  pre¬ 
ferable  3ince  it  only  requires  n  processors  and,  as  such,  could  be 
implemented  on  the  parallel  computer  proposed  in  Section  4.1. 

Basically,  to  solve  a  general  linear  system  of  the  form 
ax  =  b  where  and  x,beRr*x^  using  Pease's  algorithm,  we  must 


augment  the  "a"  matrix  with  the  vector  "b"  by  placing  "b"  in  the 
n+1  column  as  follows: 

A  =  [ a:b  ] 

and  solve  x  =  a~^b  by  performing  the  following  operations  in  parallel 
on  the  rows  of  A. 

for  j  =  1  step  1  until  n  do 

begin  t^a^j/a^  i»l,2,...,n  i+i 
for  k»j+l  step  1  until  n+1  do 

tk-uTVj*  i,:L>2 . 1  ^ 

end; 

Xi  ^.n+l^ii  i=l,2,...,n 

If  the  arithmetic  operations  required  by  Pease's  algorithm 
are  counted  assuming  they  are  done  in  parallel,  then  only  n(n+l)/2 
additions  and  n(n+3)/2  multiplications  are  needed  by  this  method. 

Note  that  this  is  significantly  faster  than  a  serial  Gaussian- 
Elimination  procedure  which  required  0(nJ)  multiplications  and  addi¬ 
tions.  In  summary,  if  Pease's  algorithm  is  used  to  solve  the  linear 
systems  required  in  step  2  of  the  PQN  method,  then  n(n+l)/2  additions 
and  n(n+3)/2  multiplication  must  be  performed  for  each  system  of 

equations.  Note  that  the  linear  systems  in  step  2  are  increasing 
in  size  and  step  2  must  be  executed  n-1  times.  Therefore,  the  total 
number  of  operations  which  must  be  performed  during  step  2  is  given 
by: 
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performed  in  parallel.  By  straightforward  evaluation,  it  can  be 
2  2 

shown  that  ( 8n  +lln)  multiplications  and  (7n  +2n-2)  additions  must 

be  performed  in  step  h  of  the  PQN  algorithm  assuming  "n"  processors 

acre  utilized  for  each  vector-matrix  operation  and  <f>  >  0.  If  (j>  ■  0, 

then  the  update  is  somewhat  simpler  and  as  a  result  only  (Un2+l*n) 

o 

multiplications  and  (3n  +n)  additions  would  be  needed. 

In  step  5,  a  single  line  search  is  required.  If  we  assume 
'L  "  function  evaluations  sire  performed  during  the  line  search  and 
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f(x^+1^ )  is  evaluated,  then  L+l  faction  evaluations  are  performed 
during  this  step. 

To  obtain  the  overall  operation  count,  the  operation  counts 
obtained  for  steps  2-5  are  combined.  Hence,  the  overall  operation 
count  for  one  iteration  of  the  PQN  algorithm  is  given  by: 

PQN :  (7n2+lm+n(n2-l)/6-2)A 

+  (8n2+lln+n(n2+3n-M/6)M 

+  (L+l )FE+1GE 

n  denotes  the  dimensionality  of  the  problem, 

A  denotes  additions 
M  denotes  multiplications 
FE  denotes  function  evaluations 
GE  denotes  gradient  evaluations , 
and  L  denotes  the  number  of  function  evaluations  during  a 

line  search. 

Using  the  operation  count  above,  it  is  a  simple  matter  to 
estimate  the  execution  time  for  one  iteration  of  the  PQN  algorithm. 
Specifically,  if  t&,  denote  the  processor  add  time, 

processor  multiply  time,  the  time  required  for  one  function  evaluation 
and  the  time  required  for  one  gradient  evaluation  respectively,  then 
by  multiplying  the  operation  count  above  by  these  quantities  the 
execution  time  for  one  iteration  of  the  PQN  may  be  obtained  as  follows 

PQN:  (7n2+l+n+n(n2-l)/6-2)Ata 
.  +  ( 8n2+lln+n(n2+3n-k)/6)Mt 

m 

+  (L+l)FEtfe+lGEtge  (1*  .2.1-2 ) 
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Recall  that,  when  <p  =  0  (  <b  =  1  )  in  the  PQN  algorithm, 
the  PDF?  (?3FS)  method  is  obtained.  Observe  if  <p  *  0,  then 
the  update  in  step  U  is  somewhat  simpler  and  the  time  per  '.teration 
is  reduced  to: 

PDFP:  t  .  =  (2n2  +  3n  +  n(n2-l)/6)  At  +  (Un2  +  Un 
pi  a 

+  n(n2  +  3n  -  h)/l6)  Mt  +  (L  +  1)  FEt, 

m  i  e 

+  1  GEt  ( U  .2.1-3) 

ge 

Also,  note  that  the  time  for  one  iteration  of  the  PBFS  method 
is  the  same  as  the  PQN  algorithm  since  the  PBFS  method  is  only  a  spe¬ 
cial  case  of  the  PQN  algorithm  when  $  >  0. 

Following  a  similar  procedure  as  outlined  above,  the  execu¬ 
tion  time  of  one  iteration  of  the  CM  and  PVM  procedures  can  be  shown 
to  be: 

CM:  t  .  =  (5  +  n)  Mt  +  (3  +  2n)  At  +  (L  +  l)  FEt. 
pi  m  a  le 

(U.2.1-U) 

PVM:  t  .  =  (2n2  +  n[U  +  3(n-l)/2]  -  l)  At 
pi  a 

+  (2n2  +  3n[l  +  (n-l)/2]  +  2)  Mt 

m 

+  (L  +  1)  FEt „  +  1  GEt  (U.2.1-5) 

fe  ge 

As  indicated  earlier,  it  is  useful  to  estimate  the  execution 
time  for  one  iteration  of  the  ZP,DFP  and  3FS  procedures  assuming  all 
computations  are  done  sequentially.  3y  performing  an  operation  count 
and  multiplying  it  by  representative  execution  times,  it  is  straight¬ 
forward  to  show  that  the  execution  time  of  one  iteration  of  the  serial 


methods  are: 
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ZP:  t  -  (n2  +  3n  +  2)  Mt  +  (n2  +  kn  +  1)  At, 
pi  m  a 

+  [(n  +  2)  L  +  1]  PBtfe  (fc.2.1-6) 

DFP:  t  ,  =  (6n2  +  3n)  Mt  +  (hn2  +  2n  -  2).  At 

pi  ffl  ft 

+  (L  +  1)  FEt,  +  1  GEt  (V.2.1-7) 

f  e  ge  „ 

BFS:  t  ,  «  (7n2  +  kn  +  2)  Mt  +  (l»n2  +  5n  -  1»)  At 
pi  in  a 

+  (L  +  1)  FEt,  +  1  GEt  (h.2.1-8) 

re  ge 

Note  that  if  the  number  of  additions  and  multiplications  per¬ 
formed  during  one  function  and  gradient  evaluation  were  known,  then  the 
execution  time  could  be  estimated  as  a  function  of  processor  add  and 
multiply  times.  In  the  next  section,  t^g  and  t  will  be  estimated  in 

terms  of  t  and  t  which  will  allow  the  speed-up  due  to  parallelism  to 
am 

be  estimated  in  Section  U.2.3. 
k.2.2  Integration  Phase 

In  the  previous  section,  the  execution  times  of  the  minimiza¬ 
tion  phase  of  the  parallel  algorithms  discussed  in  Chapter  Two  were 
estimated  in  terms  processor  add  time,  multiply  time  and  the  time  re¬ 
quired  for  a  function  and  gradient  evaluation.  In  the  context  of  the 
parallel  algorithms  described  in  Chapter  Two,  a  function  (or  gradient) 
evaluation  consists  of  integrating  a  set  of  differential  equations  over 
an  appropriate  time  interval  and  evaluating  a  suitably  defined  error 
function. 

Thus,  the  function  (or  gradient)  evaluation  time  depends  pri¬ 
marily  on  the  numerical  integration  method  employed  and  the  complexity 
of  the  differential  equations  being  integrated.  This  will  be  made  more 
precise  by  estimating  the  execution  time  required  by  the  parallel 
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predictor-corrector  (PPC)  method  discussed  in  Section  3.2  and  comparing 
it  with  the  execution  time  required  by  a  serial  Adam's  predictor- 
corrector  (APC)  method. 

To  begin,  consider  the  PPC  integration  method  defined  by: 


yi+l  =  yi-l+3^8fi"5fi-l+ljfi-2_fl-3^  (14.2.2-1) 

h  ■  yi-14(9fi+19fi-l-5fi-2  (U. 2. 2-2) 

As  described  in  Section  3.2,  the  predictor  and  corrector 
equation  above  may  be  evaluated  simultaneously  on  tvo  separate  pro¬ 
cessors.  Also,  the  right-hand-side  (RHS)  of  the  initial-value 
problem  (IVP)  can  be  evaluated  simultaneously  since  the  sequence  of 
computations  is 

PE#l:.  .  •  *  -  y? +i  -  ff+i  -  •  •  •  • 


-c  +  -pc 


•  •  •  • 


PE*2:.  .  .  .  -*•  y£  _i 
By  performing  the  computations  in  this  manner,  effecnively 
only  one  RHS  evaluation  is  performed  at  each  step.  Also,  when  evalu¬ 
ating  eqn .  (4, 2.2-1)  and  (U.2.2-2)  four  additions  and  five  multiplica¬ 
tions  must  be  performed.  Therefore,  the  total  number  of  arithmetic 
operations  which  must  be  carried  out  per  step  is: 

UA+5M+1RHS  {k.2.2-3) 


Now  suppose  N  steps  are  taken  to  solve  an  IVP  over  some 
interval  of  time.  Then  the  time  for  one  function  evaluation  may  be 
estimated  using  the  following  expression: 


tf.  ’  (4.2.2-M 

where  t  ^  is  the  time  required  to  perform  one  RHS  evaluation. 
Finally,  if  there  are  A^^  additions  and  ^rhs  multiplications  when 
evaluating  the  RHS  of  an  IVP,  then  t^.  may  be  estimated  as  follows: 
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t  =  nt . 
ge  fe 


lfe'  (UUrhS,BAV(5tlMrh,)“‘m  (14.2.2-5) 

Note  that,  if  the  function  being  evaluated  is  a  function 
of  n  variables,  then  the  gradient  of  such  a  function  has  n  conponerts. 
If  ve  assume  that  each  component  of  the  gradient  requires  approxi¬ 
mately  the  same  time  to  evaluate  as  the  function  itself,  then  the 
time  required  to  evaluate  the  gradient  is  approximately* 

(U.2.2-6) 

To  form  a  basis  for  comparison,  it  is  instructive  to  count 
the  total  number  of  arithmetic  operations  which  must  be  performed  by 
the  serial  Adam’s  predictor-corrector  (A PC)  pair: 

(h.2.2-7) 

»?♦!  *  y'i  +  It  (55  «?  -  59  ^  *  37  f'.2  -  9  fj_,> 

*1.1  -  *1  +  It  <«i.i  *  W  fi  -  5  fi-l  *  fU>  (14.2.2-8) 

From  eqns.  (h.2.2-1)  and  (4. 2. 2-8),  it  is  straightforward  to 
show  that  8A  +  11M  +  2  RHS  operations  must  be  carried  out  sequentially 
to  implement  the  serial  APC  pair. 

As  before,  if  A  ,  additions  and  M  ,  multiplication  are 
’  rhs  rhs 

performed  during  a  RHS  evaluation,  then  the  time  required  for  one 
function  and  gradient  evaluation  is  simply: 

‘ft  *  '^2ArhS)BAV(ll*2’W)Mt»  (14.2.2-9) 


and 

t  s  nt.  (U. 2. 2-10) 

ge  fe 

Note  that  two  RHS  evaluations  must  be  performed  at  each 
step  of  the  serial  APC  method  while  only  one  RHS  evaluation  was 
required  by  the  parallel  APC.  Since  a  RHS  evaluation  is  the  most 
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time  consuming  operation  that  must  be  performed,  the  PPC 
method  will  execute  nearly  twice  as  fast  as  the  serial  APC  method. 
Ideally  then,  the  speed-up  due  to  parallelism  would  be  approximately 
two  in  this  case. 
k.2.3  Overall  Execution  Time 

In  Section  1*.2.1  and  it. 2. 2,  the  execution  time  of  the 
minimization  phase  and  integration  phase  of  the  parallel  algorithms 
discussed  in  Chapter  Two  was  estimated.  Also,  in  these  sections, 
the  timing  equations  for  some  widely  used  serial  procedures  were 
given.  In  this  section,  these  results  will  be  utilized  to  estimate 
the  execution  time  of  the  overall  algorithm,  as  well  as  the  speed-up 
due  to  parallelism. 

To  estimate  the  overall  execution  time  of  the  parallel 
algorithm  discussed  in  Chapter  Two,  we  simply  substitute  eqns . 
(U.2.2-5)  and  (U.2.2-6)  into  one  of  the  timing  equations  for  the 
parallel  minimization  algorithms.  Similarly,  the  overall  execution 
time  of  a  completely  serial  method  mtiy  be  obtained  by  substituting 
eqns.  (l*.2.2-9)  and  (k. 2. 2-10)  into  one  of  the  timing  equations 
associated  with  the  serial  minimization  procedures. 

For  example,  if  the  serial  ZP  or  DFP  method  were  used 
with  the  serial  APC  method,  the  overall  execution  time  for  one 
iteration  is : 

ZP/APC : 

t  .  =  {n2+3n+2+[(n+2)L+l](ll+2M  .  )N}Mt 
pi  rhs  m 

+{n2+l+n+l+[(n+2)L+l](8+2A  ,  )N>At 

rhs  a 


(It. 2. 2-11) 


DFP/APC: 

t  .  »  {6n2+3n+(L+n+l)(ll+2M  ,  )N}Ht 
pi  rhs  m 

+  {l*n2+2n-2+(L+n+l)(8+2A  ,  )N}At 

rns  a 


(1*. 2.2-1 2) 


On  the  other  hand,  if  the  CM  or  PVM  algorithm  vere  used  with 
the  PPC  integration  method,  the  time  per  iteration  of  the  overall 
algorithm  would  be: 

CM/PPC: 

*{3«n*(L.l)(l.'W!l>Ate  (l.  2.2-13) 


PVM/PPC: 

t  .  -  {2n2*3uf2->!n(ri-l)+(5+M 
pi  2  rns  m 


(It. 2.2-11*) 


+{2n2+l*n-l+Jn(n-l)+(L+n+l) (It+A^g )N}At&  ^  2.2-llt) 

Hote  that  if  the  processor  add  time,  t  ,  and  multiply  time, 

t^,  were  known  along  with  Mr  L,  N,  A^g  and  Mrhs*  then  the  time  per 

iteration,  t  could  be  estimated  using  eqn.  (It. 2.2-11),  (1*. 2.2-12), 
pi 

(It. 2. 2-13)  or  (1* .2.2-11* ) .  Also,  the  timing  equations  may  be  used  to 
estimate  the  speed-up  due  to  parallelism,  the  details  of  which  are 
given  in  the  following  theorem: 

Theorem  1* .  1 :  Let  N  >  >  n2  and  2(My^s+Arhs)  >  >  19.  Then  the  speed-up 
4_ue  to  parallelism,  S,  satisfies  the  following  inequali¬ 
ties  : 

t  .  of  the  DFP/APC  method 

i  c  A  — B.\.  ■ — - - - >  9 

•  -  t  .  of  the  PVM/PPC  method 


H.  S  <S  V  °f  the  ZiWC  "eth°a  2[ (n.2)L*l] 


“  t  .  of  the  CM/PPC  method  >  L+l 
Pi 


r 


Proof:  To  prove  i.:  Dividing  eqn.  (14.2.2-12)  by  (l*.2.2-ll*)  end 

2 

using  the  fact  that  N  »  n  ,  we  have 

_  19+2(M  ,  A  ,  ) 

S  =  rhs+  rh3 

9+M  +A  . 
rhs  rhs 

But  since  2(M  ,  +A  ,  )  >  19,  S  may  be  bounded  from  belov  as  follows: 
rhs  rhs  * 

s  *  *  2-o51‘  =■ 2 

To  prove  ii.:  Dividing  eqn.  (14.2.2-11)  by  eqn.  (U. 2. 2-13)  and 


recalling  that  N  >>  n  ,  we  have 
„  _  (n+2)L+l  r19+2^Mrhs+Arhs)i 

^  ~  VTl*I  t  "  rVi_Y/  _  -L.  A  J  • 


9+h  r  +A  , 
rhs  rhs 


But  >  ^  iinplies  that 

„  .  2.05l4[(n+2)L+l]  .  2[(n+2)L+l] 

S  L+l  in 

To  illustrate  the  impact  of  parallelism,  as  veil  as  the 

tightness  of  the  bounds  given  by  Theorem  l*.l,  suppose  t  =  200  nsec, 

a 

t  =  1000  nsec,  n=9,  L=l*,  N=100,  A  =33  and  M  .  =1*3,  Then  by  sub- 
m  rhs  rhs 

stituting  these  quantities  intoeqns.  (14.2.2-11),  (1*. 2. 2-12),  and 

(l*.2.2-ll4) ,  we  have: 

t  .  =  0.1*5)  3  seconds  for  ZP/APC 
Pi 

t  .  =  0.ll*32  seconds  for  DFP/APC 
Pi 

t  .  =  0.02721  seconds  for  CM/PPC 
Pi 

t  .  =  0.0710  seconds  for  PVM/PPC 
Pi 

Observe  that  the  results  above  indicate  that  one  iteration 


of  the  parallel  algorithms  require  significantly  less  time  to  execute 
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CHAPTER  FIVE 


PERFORMANCE  OF  PARALLEL  ALGORITHMS 

In  this  chapter,  the  performance  of  the  parallel  identifi¬ 
cation,  estimation  and  control  algorithms  devised  in  this  thesis  vill 
be  evaluated.  In  Section  5-1,  the  robustness  of  the  parallel  variable 
metric  algorithm  discussed  in  Section  3-1.1  is  demonstrated  by  solving 
an  optimal  control  problem  associated  with  a  Van  der  Po.l  process. 

All  aspects  of  the  parallel  algorithms  are  tested  including  parallel 
shooting  and  adaptive  mesh  selection.  In  Section  5.2,  the  perfor¬ 
mance  of  the  parallel  state  and  parameter  estimation  algorithms  is  eval¬ 
uated  by  identifying  the  aerodynamic  parameters  and  the  initial  state 
of  the  lateral  equations  of  motion  for  a  T-23  aircraft.  The  indirect 
and  direct  control  algorithms  developed  in  Section  2.1  are  then  util¬ 
ized  in  Section  5.3  to  design  a  controller  for  controlling  the  longi¬ 
tudinal  equations  of  motion  for  a  F-8  Crusader  aircraft.  Finally,  in 
Section  5.1*,  the  performance  of  the  parallel  adaptive  control  algo¬ 
rithms  discussed  in  Section  2.3  is  evaluated  for  the  F-8  aircraft. 

To  establish  a  basis  for  comparison,  the  serial  Davidon- 
Fletcher-Pcwell  (DFP),  Eroyden-Fletcher-Shannc  (3FS),  and  Zangwill- 
Powell  ( Z? )  methods,  along  with  Straeter's  parallel  variable  metric 
(PVM)  algorithm,  the  Chazan-Miranker  (CM)  method,  and  the  parallel 
quasi-Newton  (P«iN)  method  will  be  employed  to  minimize  the  appropriate 
error  functions  described  in  Chapter  Two.  The  gradients  required  by 
the  variable  metric  algorithms  were  obtained  by  finite-differencing 
and  the  line  searches  were  implemented  by  fitting  a  quadratic  function 
through  three  points.  All  of  the  parallel  algorithms  were  coded  in 
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FORTRAN  IV  and  executed  on  an  IBM  3033  (a  serial  computer)  since  a 
parallel  computer  was  not  accessible. 

5 • 1  Evaluation  of  Parallel  Algorithm  Performance 

In  this  section,  the  Van  der  Pol  system  presented  in  Section 
3-3  will  be  used  to  study  the  following: 

•  The  robustness  of  the  parallel  variable  metric  algorithm  discussed 
in  Section  3.1. 

•  The  convergence  properties  of  the  indirect  control  algorithm  dis¬ 
cussed  in  Section  2.2  when  different  integration  methods  are 
employed. 

•  The  effect  the  number  of  subintervals  has  on  convergence  of  the 
parallel  shooting  algorithm  presented  in  Section  2-3- 

•  The  convergence  properties  of  the  adaptive  mesh  selection  algo¬ 
rithm  described  in  Section  3.1. 

To  begin  this  study,  recall  from  Section  3*3  that  the  Van 
der  Pol  system  may  be  written  as  follows : 

xn (t)  =  x2(t)  (5-1-1) 

x2(t)  =  a  (t)  [l  -  x^(t)]  x2(t)  -  x^t)  +  u(t)  (5-1-2) 

where  x(0)  =  Xq  is  known,  a(t)  is  a  parameter  which  reflects  a  par¬ 
ticular  system's  dynamics,  and  u(t)  is  a  control  variable. 

The  problem  considered  in  this  section  is  to  obtain  the 
optimal  control,  u(t),  which  minimizes  the  cost  functional: 

r 

J  =j  [x2(t)  +  x2(t)  +  u2(t)]  dt  (5.1-3) 
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subject  to  the  satisfaction  of  eqns.  (5-1-1)  and  (5-1-2).  From  the 
necessary  conditions  of  optimality  given  in  Section  2.3,  it  is  easy 
to  show  that  the  nonlinear  two-point  boundary  value  problem  (NTPBVP) 
associated  with  the  Van  der  Pol  system  is  given  by: 

State  Equations 

x^(t)  =  x2(t)  (5-1-U) 

x2(t)  =  a(t )  [l  -  x^(t)]  x^(t)  -  x1(t)  +  u(t)  (5-1-5) 


Costate  Equations 

Xx(t)  =  Ag(t)  [2  a(t )  x1(t)  x2(t)  +  1] 

-  2  xx(t)  (5-1-6) 

X2(t)  =  -X1(t)  -  Xg(t)  a(t )  [1  -  x^(t)] 

-  2  x2(t)  (5-1-7) 

Boundary  Conditions 

x(C)  =  [x10,  >"-20]T  ( 5 •  l-8a) 

X(5)  =  [0,  0]T  (5-l-8b) 

Optimal  Control 

u(t)  =  -h  >g(t)  (5-1-9) 


If  ordinary  shooting  is  used  to  solve  the  NTPBVP  above,  then 
an  appropriate  error  function  which  must  be  minimized  is  simply: 

E  =  | |X(5)| |2  (5-1-10) 

To  demonstrate  the  effectiveness  of  ordinary  shooting  and  the 
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parallel  algorithms  developed  in  Chapters  Tvo  ar.d  Three,  these 
methods  were  used  to  determine  the  initial  costate  which  causes  eqn. 
(5-1-10)  to  be  minimized,  given  that 


a(t)  =1  V  t  £  [0,  5] 

and 

x(0)  =  [0  if. 

T 

The  initial  costate  vector  was  chosen  as  A(o)  =  [0.5  5-0 ] 

which  caused  the  forward  integration  of  eqns.(5-l-M,  (5-1-5),  5-1-6) 
and  (5-1-7)  to  be  well  defined.  As  a  result,  the  NTPBVP  could  be 
easily  solved  for  the  optimal  unconstrained  control  which  is  shown  in 
Figure  5-1- 

In  many  instances,  due  to  physical  constraints,  the  optimal 
control  may  be  bounded.  To  accommodate  problems  of  this  type,  the 
bounded  control  algorithm  described  in  Section  2.3  can  be  employed. 

To  demonstrate  the  effectiveness  of  this  procedure,  the  optimal  con¬ 
trol  problem  above  was  solved  assuming  |u(t)  |  _<  0.8  V  t  e  [0,  5  ]• 

The  optimal  bounded  control  for  this  example  is  shown  in  Figure  5.1. 

The  ability  of  the  parallel  initial  costate  algorithm  to  re¬ 
duce  the  error  function  defined  by  eqn  (5-1-10)  is  shown  in  Tables 
5. 1-5.1*.  Since  the  parallel  algorithms  were  simulated  on  an  IBM  3033 
(a  serial  computer),  the  total  number  of  function  and  gradient  evalua¬ 
tions  shown  in  Tables  5- 1-5-1*  reflect  the  fact  that  an  advanced  com¬ 
puter  with  n=2orn+l=3  levels  of  parallelism  should  be  used  to 
execute  the  Chazan-Miranker  method  or  Straeter’s  method,  respectively. 
The  results  indicate: 
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•  For  initial  values  of  A(t  )  close  to  optimum,  the  Chazan-Miranker 

o 

procedure  is  capable  of  speeding  up  convergence  by  nearly  a  factor 
of  two  over  the  DFP  and  ZP  methods.  Straeter's  method  is  much  more 
effective  than  the  DFP  method  in  reducing  the  error  function  for 
about  the  same  number  of  function  and  gradient  evaluations  (Table  5-1). 

•  For  initial  values  of  l(tQ)  far  from  optimal,  the  parallel  algo¬ 
rithms  are  capable  of  reducing  the  error  function  by  several  orders 
of  magnitude  compared  with  the  DFP  method .  Although  the  ZP  method 
managed  to  reduce  the  error  function  the  most,  this  was  achieved  at 
the  expense  of  many  function  evaluations  (Table  5-2). 

•  When  the  control  is  constrained  (|u(t)|  <_  0.8)  and  the  initial  value 
of  X(t  )  is  selected  near  the  optimum,  all  procedures  converged. 

Note  that  the  ZP  method  required  a  relatively  large  number  of  func¬ 
tion  evaluations  to  do  so  however  (Table  5.3). 

•  If  u(t)  is  constrained,  Straeter's  method  is  clearly  superior  to  the 
DFP  method  when  X ( t ^ )  is  initially  far  from  optimal.  Observe  that 
although  the  CM  method  required  many  function  evaluations,  conver¬ 
gence  did  result,  whereas  the  ZP  method  failed  to  locate  the  minimum 
in  this  case  (Table  5-M. 

5.1.1  Robustness  of  Parallel  Minimization  Algorithms 

To  study  the  robustness  of  the  parallel  minimization  algo¬ 
rithms  and  the  effect  different  integration  schemes  have  on  convergence, 
ordinary  shooting  was  used  to  solve  the  NTPBVP  represented  by  eqns. 
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When  u(t)  la  Ihiconnlralneri 


30 
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(5-1-M  -  (5- 1-8).  As  in  the  previous  example,  the  error  function 
which  must  be  minimized  is  E  =  |  |  A ( 5 )  |  |  •  The  methods  used  to  inte¬ 
grate  eqns,(5.1-U)  -  (5.1-7)  forward  in  time  were  the  Adam’s  predictor- 

corrector  (APC)  pair  given  by  eqns.  (It. 2. 2-7),  (U.2.2-8),  the  parallel 
predictor-corrector  (PPCl2)  pair  given  by  eqns.  (3.2.1-la),  (3-2. 1-lb), 
and  the  parallel  predictor-corrector  variable  step  size  (PPC12V) 
method,  discussed  in  Section  3.2.2. 

Because  of  the  expense  incurred  when  simulating  the  parallel 
algorithms,  only  two  of  the  parallel  minimization  methods  were  con¬ 
sidered;  namely  the  CM  and  PVM  algorithms.  As  shown  in  Section  3.1.3, 
the  rate  of  convergence  of  this  the  PVM  algorithm  depends  primarily 
on  a  weighting  parameter  which  defines  a  set  of  linearly  independent 
vectors  denoted  by: 

E  4  (o.,  °2,  on)  •  c  In 

where 

I  is  a  n  x  n  identity  matrix  and 
n 

c  is  a  scalar  weighting  parameter  which  is  fixed  for  all 
iterations . 

-9 

By  varying  the  weighting  parameter  over  a  large  range  (say  10  <_  c 

<  10-1)  the  robustness  of  the  PVM  method  can  be  evaluated.  Also, 
integration  effects  can  be  measured  by  using  one  of  the  integration 
methods  cited  earlier.  Because  the  APC  and  PPCl2  integration  methods 
are  fixed  step  size  procedures,  the  step  size  must  be  selected  a  priori. 
Since  the  step  size,  h,  must  be  sufficiently  small  to  assure  accurate 
results  without  an  excessive  number  of  integration  steps,  h  was  set 
to  5/100  =  0.05.  For  the  variable  step  size  method  (PPCU2V),  a  5-6 
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digit  accuracy  requirement  was  requested  which  meant  that  the  step 
size  must  be  varied  to  maintain  the  local  truncation  error  below  lO-'*. 
Initially,  the  integration  step  size  was  set  to  h  =  0.05,  but  in  order 
to  meet  the  accuracy  requirement  imposed,  the  step  size  was  immedi¬ 
ately  reduced  to  h  =  0.025.  After  this  initial  adjustment,  the  step 
size  remained  at  h  =  0.025  for  the  remainder  of  the  integration  inter¬ 
val. 

At  this  point,  simulations  were  performed  to  determine  the 
convergence  properties  of  the  indirect  control  algorithm  when  the 
integration  methods  described  above  are  employed.  The  results  are 
summarized  in  Figure  5-2. 

The  results  indicate: 

•  The  robustness  of  the  PVM  algorithm  is  enhanced  the  most  when  the 

PPCk2  integration  scheme  is  employed,  i.e.,  for  a  wide  range  of  c 
(10  <_  c  <_  10  ),  the  performance  of  the  PVM  algorithm  is  insensi¬ 

tive  to  the  specific  value  of  the  weighting  parameter. 

•  The  parallel  integration  methods  enhance  the  robustness  of  the  PVM 
algorithm  more  than  the  serial  APC  method;  although  the  PVM  method 
could  be  tuned  to  converge  the  fastest  when  the  APC  method  was 
employed  (see  Figure  5.2). 

Before  leaving  this  section,  a  few  words  should  be  said 
about  the  robustness  of  the  Chazan-Mi ranker  (CM)  algorithm.  To  use 
the  CM  algorithm,  a  monotone  decreasing  sequence  must  be  selected.  In 
optimal  control  applications,  it  has  been  determined  empirically  that 
a  reasonable  choice  of  the  monotone  decreasing  sequence  is : 


I  o 


c 


o  iO  O  lO 

CJ  ~ 

eoue6ja/\uoo  joj  psjjnbey 
SU014DJS4I  p  Jsquur.M  |D+oj_ 


o> 

1 

o 


136 


Weighting  Parameter 


$£  =  c  exp  (-£) 


where 

c  is  a  weighting  parameter  and  £  denotes  the  iteration 
number. 

In  the  context  of  the  CM  method,  the  term  robustness  refers 
to  the  relative  insensitivity  of  the  method  to  the  particular  value  of 

c.  By  varying  the  weighting  parameter  c  over  a  wide  range,  say  10  <_  c 

-2 

<_  10  ,  the  robustness  of  the  CM  method  can  be  measured.  To  this 
effect,  the  CM  method  was  used  with  PPC  integration  to  solve  the 
NTPBVF  associated  with  the  Van  der  Pol  system.  The  results  obtained 
are  shown  in  Figure  5-3- 

The  results  indicate  that  the  CM  algorithm  was  not  very  robust 
at  all.  In  fact,  if  the  weighting  parameter  was  greater  than  10,  the 
CM  method  didn't  converge  because  the  excessively  large  value  of  c 
caused  the  forward  integration  of  eqns.  (5.1-M  -  (5*1-7)  to  become 

unstable.  In  view  cf  these  ’undesirable  results,  no  further  simulations 
were  considered. 

5.1.2  Parallel  Shooting 

In  this  section,  the  convergence  of  the  parallel  shooting 
algorithm  presented  in  Section  2.3  will  br  tudied  by  dividing  the 
mission  time  interval  into  many  subintervals  and  determining  if  con¬ 
vergence  may  be  accelerated  or  not.  Recall  from  Chapter  Two  that  as 
more  subintervals  are  introduced,  the  original  NTPBVP  becomes  a  multi¬ 
point  boundary  value  problem  and  the  number  of  unknown  boundary  condi¬ 
tions  increase  from  2n  to  n(2N-l)  where  n  is  the  system  order,  and 
N  is  the  number  of  subintervals.  Despite  this  fact,  the  major  advantage 
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FIGURE  G.3:  Robustness  of  CM  Algorithm 


of  using  parallel  shooting  is  that  the  sensitivity  problems  associated 
with  the  forward  integration  of  the  state  ana  costate  equations  is  sig¬ 
nificantly  reduced. 

In  view  of  the  above,  a  natural  question  then  is  "How  should 
N  be  chosen?"  Although  the  parallel  shooting  method  has  been  known 
for  a  number  of  years,  the  question  raised  here  has  not  been  answered 
satisfactorily . 

One  way  to  answer  the  above  question,  which  is  the  approach 
taken  here,  is  to  select  a  value  of  N,  solve  the  resultant  multipoint 
boundary  value  problem  using  parallel  shooting,  and  record  the  number 
of  iterations  required  for  convergence  as  a  function  of  N.  To  deter¬ 
mine  if  this  procedure  would  indeed  provide  some  insight  into  how  to 
choose  the  number  of  subintervals,  the  mission  time  interval  was 
divided  into  2,  3,  and  5  subintervals  and  the  parallel  shooting  algo¬ 
rithm  discussed  in  Section  2.3  was  employed  to  solve  the  resultant 
multi-point  boundary  value  problem.  To  illustrate  the  procedure,  the 
two  subinterval  case  will  be  briefly  described. 

For  the  two  subinterval  case,  the  mission  time  interval 
[0,  5]  was  arbitrarily  partitioned  into  two  subintervals  of  length 
A  =  1.6  ar.d  A0  =  3.1*.  Note  that  the  sum  of  A,  and  A  is  equal  to  the 
length  of  the  mission  time  interval  since  this  is  a  necessary  con¬ 
straint.  The  "reduced"  error  function  which  must  be  minimized  subject 
to  the  dynamic  constraints  given  by  eqns  (5.1-k)  to  (5.1-7)  is  given 
by:* 

_  ■ 

Since  x(tQ)  =  is  known,  there  is  no  need  to  include  it  in  Y^. 

Observe  that  this  reduces  the  number  of  unknowns  in  by  a  factor  of  n. 
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(5. 1.2-1) 


E  =  llf*£  +  QXr  -  y!I2 

where 


0  0  ;  0  0  0  0 

0  0  0  0  ;  1  0 

0  0  '  0  0  0  0 

1 

0  0  0  0  ;  0  1 

1 

1 

1 

0  0  ;  1  0  0  0 

II 

0/ 

1 

-1  0  0  0  1  0  0 

1 

0  0  [  0  1  0  0 

0-10  0  10  0 

0  0  j  0  0  1  0 

0  0  -1  0  |  0  0 

0  0  r  0  0  0  1 

0  0  0  -1!  0  0 

Y£  =  [A1(0)  X2(0)  xi(1>6)  V1'6)  A^i.6)  A2(i.6)]T 
Yr  =  [x^l.6)  x2(1.6)  A1(1.6)  A2(1.6)  A1(5.0)  A2(5.0)]T 

and 

y  =  [A1(5.o)  A2( 5-0)  ooocf 

Tc  start  the  parallel  shooting  method,  Y£  was  initially  set 
tc  a  zero  vector  but  upon  convergence  Y£  was  determined  to  be: 

Y£  =  (0. U 3019  5-1156  0.3189  -0.2012  1.871  -0 . 781^ )T 

Since  the  goal  of  this  section  is  to  determine  how  to  choose 
the  number  of  subintervals,  the  above  procedure  was  repeated  by  dividing 
the  mission  time  into  three  subintervals  of  length,  A^  =  1.0,  A„  =  1.0, 
and  A^  =  3-0,  and  then  into  five  subintervals  cf  length,  A^  =  A2  =  A^  = 
A^  =  A;.  =  1.0.  Observe  that,  in  each  case  the  constraint  that  the  sum 
of  the  subinterval  lengths  must  be  equal  to  the  mission  time  was 
imposed  at  all  times.  Finally,  to  be  consistent  with  the  two  subinter¬ 
val  case,  Y£  was  initially  set  to  zero  for  both  the  three  and  five  sub¬ 
interval  cases. 

To  this  effect,  the  simulations  were  performed.  The  results 
shown  in  Figure  5.^  indicate: 
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FIGURE  5-^:  Convergence  Properties  of  the  Parallel 
Shooting  Algorithm 
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•  The  integration  interval  should  be  partitioned  into  as  few  as 


possible  subintervals  since  the  number  of  iterations  required  for 
convergence,  as  well  as  the  number  of  unknowns,  will  be  reduced 
(see  Figure  5-M  • 

•  The  number  of  iterations  required  for  convergence  appears  to  in¬ 
crease  linearly  with  the  number  of  subintervals  for  the  DFP/PPC42 
method  while  for  the  PVM/PPCl*2  method,  the  number  of  iterations 
required  for  convergence  tends  to  level  off  as  the  number  of  sub¬ 
intervals  is  increased. 

The  last  observation  may  be  directly  related  to  the  accuracy 
of  the  solution  to  the  initial-value  problems  over  each  subinterval. 

As  the  number  of  subintervals  increase,  the  step  size  employed  by  the 
PPCU2  integration  method  decreases  because  the  number  of  integration 
steps  taken  over  a  given  subinterval  was  held  fixed.  As  a  result,  a 
more  accurate  integration  of  the  appropriate  differential  equations  was 
obtained  over  each  subinuerval .  Thus,  the  gradients  required  by  the 
PVM  and  DFP  methods  were  obtained  more  accurately  as  the  number  of 
subintervals  was  increased.  But  since  the  PVK  algorithm  is  known  to 
perform  better  when  accurate  gradients  are  utilized,  this  may  explain 
why  the  number  of  iterations  required  by  the  PVM/PPCU2  method  does  not 
increase  very  rapidly.  On  the  other  hand,  since  the  performance  of 
the  DFP  method  is  generally  not  affected  very  much  by  inaccurate  gra¬ 
dient  information,  this  may  explain  the  linear  trend  shown  in  Figure 
5.1*  for  the  DFP/PPCU2  method. 

It  is  instructive  to  note  that  the  dimension  of  in  each 
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case  was  n(2N  -  l).  Since  n  =  1* ,  this  implies  that  6,  10,  and  18 
unknowns  must  be  found  when  2,  3,  and  5  subintervals  were  used  re¬ 
spectively.  Since  the  parallel  algorithm  converged  in  all  cases, 
this  indicates  that  this  method  may  be  employed  to  solve  high  order 
optimization  problems. 

Finally,  it  should  be  noted  that  the  results  obtained  in 
this  section  are  not  very  meaningful  as  is, because  no  insight  has  been 
gained  as  to  where  the  mesh  points  should  be  placed.  However,  if  the 
number  of  subintervals  has  been  specified,  then  the  adaptive  mesh 
selection  algorithm  discussed  in  Section  2  .2 .2 .1  can  be  utilized  to  allo¬ 
cate  the  mesh  points  in  an  optimal  fashion.  An  example  illustrating 
the  use  of  the  adaptive  mesh  selection  algorithm  is  presented  in  the 
next  section. 

5.1.3  Adaptive  Mesh  Selection 

In  this  section,  the  adaptive  mesh  selection  (AMS)  algorithm 


1  ] 

t 


described  in  Section  2. 2. 2.1  will  be  employed  to  optimize  the  mesh 
points  required  by  the  parallel  shooting  aigori  ohm.  In  view  of  the  results 
obtained  in  the  previous  section ,  the  mission  time  interval  [0 ,  5]  was 
divided  into  only  two  subintervals.  Hence,  the  problem  was  to  simulta¬ 
neously  find  the  optimal  values  of  the  subinterval  lengths  (A^  and  A^) 
and  the  solution  to  the  resulting  multi-point  boundary  value  problem 
subject  to  the  dynamic  constraints  given  by  eqns . ( 5.1-U )  to  (5-1-8), 
as  well  as  the  static  constraints,  A^  >  0,  A^  >  0,  and  A^  +  t\^  -  5- 
To  use  the  AMS  algorithm,  one  must  decide  which  numerical 
integration  method  to  use  because  the  local  truncation  error  associated 


with  the  integration  method  selected  is  required.  Since  the  parallel 
predictor-corrector  (PPG)  method  given  by  eqn.  (  3.2.1-Ha)  and  (3.2.1-Ub) 


1  •  <v  '  ..*•  - 


is 


accurate  to  0(h  ),  this  method  was  selected  to  integrate  the  re¬ 


quired  initial-value  problems. 

To  illustrate  the  use  of  the  AMS  algorithm,  consider  the 


NTPBVP  represented  by  eqns.(5-l-*0  -  (5-1-8)  and  the  error  function 


below: 


where 


E  =  ||  Pi  +  -  YU2  +  I  !e||2  (5.1. 3-1) 
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Y£  =  [XfO)  X2(0)  x1(A1)  x2(A1)  X1(A1)  X2(A2)  A1]T 
Yr  =  [x1(A1)  xfAf  X1(A1)  XfAf  Xx(5.0)  XfS.O)]1 


and 

y  =  Cx1(5.o)  x2(5.o)  0  0  0  of. 


The  first  term  shown  in  eqn  (5-1- 3-1)  is  the  usual  error  function 
associated  with  the  parallel  shooting  method  while  the  second  term  is 
included  to  allow  the  mesh  points  to  be  optimized.  For  the  problem 
under  consideration,  the  error  vector,  e,  has  two  components,  namely: 


e^+1  = 
max 


max 

tj  <t  <  t 


eL(t) 


J  =  0,  1  (5-1. 3-2) 


where  eT(t)  is  the  local  truncation  error  associated  with  the  PPCU2 

lj 

integration  method.  Recall  from  Section  3-2.2  that  the  norm  of  eft) 
was  derived  to  be: 
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(5. 1.3-3) 


where  in  this  case  y  =  (x^  x^  X^  Xg)  and  f  consists  of  the  right- 
hand  side  (RHS )  of  x  ,  x 2>  X  and  X 2< 

In  view  of  the  above  problem  formulation,  the  parallel  algo¬ 
rithms  discussed  in  Chapter  Three  were  used  to  minimize  eqn. ( 5 .1. 3-1 ) 
subject  to  the  dynamic  constraints  given  eqns.(5*l-*0  —  (5-1-8)  and 
the  static  constraint  A^  >  0,  A^  >  0,  and  A  +  =  5-  Based  upon 

previously  obtained  solutions,  Y^  was  initially  selected  as: 

Y£  =  (0.1*  5.0  0.25  -0.2  1.1*  -0.9  2.5) 
which  resulted  in  an  initial  error  function  value  of  E  =  0.9157  and 
[ | e | |  =  0.01001*.  Based  upon  the  definition  of  E,  these  values  indi¬ 
cate  that  initially  most  of  the  error  was  due  to  the  choice  of  x  and 
X  at  the  partition  points  rather  than  the  choice  of  the  partition 
points  themselves.  To  reduce  E,  the  DFP  was  used  in  conjunction  with 
the  PPCl*2  integration  method.  After  ten  iterations,  the  following 
values  of  Y£,  E  and  ||ej]  were  obtained. 

Yj_  =  (0.1*192  5.1201  0.1505  -0.2093  0.9562  -0.8581  2.33M 

E  =  0.0093  and  ||e||  =  0.00929 

Observe  that  these  results  indicate  that  the  error  in  the  solution  at 
the  partition  points  is  only  0.0093  -  0.00929  =  0.00001  and  that  the 
norm  of  the  local  truncation  error,  |  |e|  |  ,  dominates  over  the  solution 
error.  Also,  note  that  after  ten  iterations,  the  subinterval  lengths 
had  converged  to  A^  =  0.2331*  and  A^  =  5  -  A^  =  1**7666. 

To  determine  if  the  local  truncation  error  could  be  reduced 
still  further,  the  AMS  algorithm  was  allowed  to  execute  for  a  total 
of  50  iterations.  After  50  iterations ,  the  norm  of  the  local  truncation 
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error  had  been  reduced  to  |  j  e  |  |  =  0 . 007k02  while  the  total  error  was  E  = 
0.00772k.  note  that  j  |e|  |  had  been  reduced  by  about  23%  from  its  initial 
value  of  0.01004  which  is  encouraging.  The  value  of  Y^  obtained  after 
50  iterations  was: 

Y£  =  (0.3285  5.0683  0.3351*  -0.208k  2.00k7  -0.736k  l.k378) 
which  indicates  that  the  subinterval  lengths  should  be  selected  as 
=  1.1+378  and  Ag  =  3-5622. 

In  summary,  it  can  be  concluded  from  the  results  presented 
in  this  section  that  by  using  the  AMS  algorithm  to  optimally  select 
the  mesh  points  required  by  the  parallel  shooting  algorithm,  the  local 
truncation  error  can  indeed  be  minimized,  although  many  iterations 
may  be  required.  Observe,  if  the  integration  interval  [0,  5]  is 
divided  into  two  subintervals  of  lengths  A^  -  1.6  and  A^  =  3-k  and  if 
the  parallel  shooting  algorithm  is  used,  then  it  was  shown  in  the 
previous  section  that  the  optimal  value  of  Y^  is  given  by: 

Yj,  =  (0.k3019  5-1156  0.3189  -0.2012  1.871  -0.78lk)T 

In  this  case,  the  norm  of  the  local  truncation  error,  ||e||,  is  |je]' 

=  0.008lk  which  is  nearly  8$  greater  than  the  error  obtained  using 
the  adaptive  mesh  selection  algorithm. 

5 . 2  Evaluation  of  Estimator  Performance 

The  performance  of  the  parallel  state  and  parameter  estima¬ 
tion  algorithms  was  evaluated  using  simulated  measurement  data  charac¬ 
terizing  the  lateral  motion  of  a  T-33  aircraft .  The  equations  of 
motion  used  in  the  simulations  were  given  by  [k2]: 
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(5.2-1) 


where  the  state  variables  x^(t),  x2(t),  xg(t),  and  x^(t)  represent  the 

sideslip  angle,  roll  rate,  yaw  rate  and  roll  angle  respectively.  The 

aileron  deflection  angle, 6  ,was  selected  as  a  one  degree  step  command 

a 

for  the  purposes  of  identification. 

The  measurement  model  used  in  the  simulation  was  selected 


as : 

z(t)  =  x ( t )  +  v ( t ) 

where  v(t)  is  a  zero-mean  WON  process  with  covariance 

.001*  0  0  0 

0  1*.0  0  0 

0  0  .017  0 

0  0  0  3.1* 


(5-2-2) 


l(t)  = 


This  value  of  Q(t)  was  selected  since  a  ratio  of  signal 

variance  to  noise  variance  of  approximately  two  was  desired. 

The  objective  was  to  simultaneously  estimate  the  four  state 

variables  and  the  aerodyanmic  parameter  vector 

0  =  ( Yg ,  aQ ,  Lg ,  L  ,  Lr,  Hg,  N  ,  Nr,  Yg  ,  Lj  ,  N{  )T 
M  a  a  a 

However,  to  identify  all  of  the  aerodynamic  parameters  would  be  imprac¬ 
tical  since  the  computational  cost  would  be  too  high.  Therefore,  a 


sensitivity  study  was  conducted  to  determine  those  parameters  which 


most  affect  system  performance  and  therefore  are  most  important  to 


explicitly  identify.  More  specifically,  the  state  sensitivities: 

3  x. (t) 

Ax.(t)  =  — -(-fy  A  0  (t)  i  =  1,  2,  3,  U  (5.2-3) 

J  j  »  1,  2,  ....  11 

were  calculated  and  displayed  to  aid  the  decision  process  (see  Figure 

5-5).  From  Figure  5-5,  it  can  be  concluded  that  a.,  L  ,  Lx  ,  and  Nx 

0  p  0  0 

a  a 

are  the  most  important  parameters  to  identify  since  the  state  sensi¬ 
tivities  associated  with  these  parameters  are  much  larger  in  magnitude 
than  the  remaining  parameters.  Therefore,  the  remaining  parameters  can 
be  considered  less  important  and  set  to  their  nominal  values.  Thus,  we 
shall  be  concerned  with  solving  the  reduced  SAP  estimation  problem  in 
the  remainder  of  this  section. 

5.2.1  Direct  SAP  Estimation 

To  find  the  unknown  states  and  parameters  associated  with 
the  T-33  aircraft,  the  following  performance  index  was  considered: 

J  =  h  | |x(0)  -  m  j |^-1  +  h  f  | |z(t)  -  x(t) | |^-1  dt  (5- 2. 1-1) 
Xo  Pxo  ■'0  Q(t) 


where 

I 


and 


m 

x 

o 


(0  0  0  0  0.llt2  -6.51  -1*1*  •  1*  -1.8)T 


Px  -O.OUg. 
o 

The  parallel  algorithms  discussed  in  Chapters  Two  and  Three 
were  employed  to  find  x(0)  which  minimizes  eqn. (5-2.1-1)  subject  to 
eqn  (5-2-1).  To  start  the  algorithm,  the  estimate  of  the  augmented 
state  vector  was  selected  to  be  x(0)  =  0  which  resulted  in  an  initial 
performance  of  J  =  1.008  x  10^. 


- 


W* 

I 
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As  indicated  in  Table  5-5*  the  parallel  SAP  estimation 
algorithm  managed  to  reduce  the  performance  index  more  rapidly  than 
the  serial  DFF  algorithm.  In  fact,  the  results  in  Table  5-5  indicate 
that  the  serial  DFP  method  did  not  converge  after  eight  iterations 
while  the  parallel  method  converged  after  seven  iterations.  Upon 
convergence  of  the  PVM  method,  the  initial  state  and  parameter  esti¬ 
mates  compare  very  favorably  with  the  true  values  which  were 
x-i  =  0.  ,  x2  =  0.  ,  x3  =  0.  ,  =  0. 

aQ  =  0.1U2,  L  =  -6.51,  Lj  =  -k.kk  and  Nf  =1.8. 

0  p  0  a  a 

Using  the  estimated  initial  state  and  parameter  vector,  a  simulation 
of  the  T-33  aircraft  was  performed.  The  resulting  trajectories  along 
with  the  simulated  measurement  data  are  shown  in  Figures  ( 5  -  6—5 . 9 ) * 
Note  that  the  estimated  roll  rate  and  roll  angle  trajectories  are 
essentially  indistinguishable  from  the  true  trajectories  while  there 
is  only  a  small  error  associated  with  the  sideslip  and  yaw  rate  tra¬ 
jectories  (see  Figures  5-6  -  5-9) • 

5-2.2  Indirect  SAP  Estimation 

To  assess  the  performance  of  the  indirect  SAP  estimation 
algorithm  discussed  in  Section  2.1.1,  it  was  used  to  find  the  unknown 
states  and  parameters  of  the  T-33  aircraft.  To  use  the  indirect 
method,  the  SAP  estimation  problem  defined  by: 

J  =  h  I l*(t  )  -  mx  | |^-1  +  H  f  f  ( ! |z(t) 

o  "xo  'to 

-  h  [x( t ) ,  t] | J q-1( t )  +  | |w(t)| |p-1(tj)  dt 
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x(t)  =  f  [x(t),  t]  +  G[x(t),  t]  w(t) 


and 


must  be  reduced  to  a  nonlinear  two-point  boundary  value  problem 

(NTPBVP).  Using  eqns.  (2. 1.2-2),  (2. 1.2-3),  (2. 1.2-1*)  it  is  easy  to 

show  that  the  NTPBVP  associated  with  the  T-33  aircraft  is  given  by: 

x(t)  =  a  x ( t )  +  b  6  (5-2. 2-1) 

a 

^(t)  =  R  1(t)  (z(t)  -  x(t))  -  aT(t)  A ( t )  (5. 2. 2-2) 


where 
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The  boundary  conditions  associated  with  eqns. (5. 2. 2-1)  and 
(5- 2. 2-2)  are: 

X(0)  =  -P~J  Cx(0)  -  mx  3  (5.2.2-3a) 

o 


A(l)  =  0 


In  the  problem  formulation  above,  the  augmented  state  vector,  x(t), 

includes  the  unknown  parameters  a^,  L  ,  ,  and  N^  . 

P  a  a 

To  solve  the  NTPBVP  represented  by  eqns.  (5. 2. 2-1),  (5 >2. 2-2) 
(5-2.2-3a)  and  (5-2.2-3b),  ordinary  shooting  was  used  initially.  How¬ 
ever,  due  to  the  sensitivity  of  the  costate  eqn. (5-2. 2-2)  to  small 
changes  in  x(0 ), convergence  was  rather  slow. 

On  the  basis  of  these  results,  it  was  decided  that  parallel 
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shooting  should  be  considered.  Thus,  the  integration  interval  [0,  l] 
was  partitioned  into  two  subintervals  (A^  =  0.1*  and  A^  -  0.6)  and 
the  parallel  shooting  algorithm  discussed  in  Section  2.1.2  was  employed 
to  minimize  the  reduced  error  function:* 

E  =  1  |  Pf£  +  <2Yr  -  Y|  l2  (5- 2. 2-1*) 

where 


=  [xT(0)  AT(0.1O  xT(0.U)f 

Yr  =  [XT(0.1*)  xT(0. 1* )  AT(1.0)]T 

and 

y  =  0 

Note  that  by  using  parallel  shooting,  the  dimension  of  the 
problem  is  artificially  increased  from  2n  =  16  to  n(  2N  -  1 )  =  2l* . 
Although  the  problem  now  appears  more  formidable  to  solve,  this  is  not 
the  case  because  the  sensitivity  of  the  solution  to  the  selection  of 
x(0)  should  be  reduced  which  might  help  convergence. 

In  fact,  the  results  shown  in  Tables  5.6  and  5-7  indicate 
that  convergence  can  occur  using  parallel  shooting;  however,  the 
number  of  iterations  required  still  was  rather  large.  Note  that  the 

*  Since  A(0)  =  [5c ( 0 )  -  m  ]  is  known  once  x(0)  is  given,  there  is 

no  need  to  include  it  in  Y^.  Observe  that  this  reduces  the  number  of 
unknowns  in  Y^  by  a  factor  of  n. 
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results  shown  in  Tables  5>6  and  5<7  clearly  indicate  that  the  gradient 
dependent  DFP  algorithm  is  preferable  to  the  nongradient  ZP  method.* 
5.2.3  Timing  Considerations 

In  Section  U.2,  the  execution  time  of  the  parallel  (and 
serial)  algorithms  was  estimated  in  terms  of  variables  representing 
the  number  of  additions,  multiplications,  function  and  gradient  evalu¬ 
ations.  In  this  section,  these  results  are  employed  to  estimate  the 
time  required  to  simultaneously  estimate  the  state  and  parameters  of 
the  T-33  aircraft's  equations  of  motion.  Although  this  can  be  done 
using  any  of  the  timing  equations  given  in  Section  1*.3,  the  timing 
results  will  be  explicitly  calculated  assuming  the  PVM  minimization 
method  and  the  fourth-order  PPC  integration  method  are  used  in  the 
direct  SAP  estimation  algorithm  given  in  Section  2.1. 

For  the  T-33  aircraft,  10  additions  and  13  multiplications 
must  be  performed  when  evaluating  the  RHS  of  the  aircraft's  equations 
of  motion.  Also,  suppose  the  parallel  predictor-corrector  (PPCU2) 
method  requires  100  integration  steps  to  integrate  the  appropriate 
differential  equations  over  the  integration  interval  [0,1].  By  sub¬ 
stituting  =  10,  M  =  13  and  N  =  100  into  eqn.  (1*. 2.2-5),  an 

estimate  of  the  time  required  for  one  function  evaluation  can  be 
obtained  as  follows : 

t.  =  11*00  A  t  +  1800  M  t  (5- 2. 3-1) 

xe  a  m 

If  the  direct  state  and  parameter  (SAP)  estimation  procedure  is 


# 


I 

( 


The  parallel  minimization  algorithms  were  not  considered  here  due  to 
the  enormous  expense  which  would  be  incurred  when  simulating  these 
methods  on  a  serial  computer. 
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utilized,  then  n  =  8  and  the  execution  time  for  one  gradient  evalua¬ 
tion  is: 

t  =  n  t.  =  11,200  A  t  +  lA  ,100  M  t  (5-2. 3-2) 

ge  ie  a  m 

From  Table  5-5,  four  function  evaluations  are  performed  during  a  line 
search  on  the  average.  Hence,  let  L  =  1*.  By  substituting  eqns. 

(5-2. 3-1)  and  (5-2. 3-2)  into  eqn.  (1*. 2.1-5) ,  an  estimate  of  the  execu¬ 
tion  time  for  one  iteration  of  Straeter's  PVM  algorithm  may  be 
obtained  as  follows: 

t  .  =  23,638  M  t  +  18,1*13  A  t  (5.2. 3-3) 

pi  m  a 

To  illustrate  the  speed  achievable  through  parallelism, 

suppose  the  processor  add  and  multiply  times  are  t  =  200  nsec  and 

a 

t  =  1000  nsec  ,  respectively.  By  substituting  these  values  of  t 

m  a 

and  t^  into  eqn.  (5-2. 3-3),  the  execution  time  per  iteration  is  only 
0.0273  seconds.  Note  that  this  time  may  be  reduced  still  further  if 
the  PPC^2  integration  method  is  used  to  integrate  each  state  equation 
on  separate  processors.  In  particular,  if  l6  processors  (two  for 
each  state  equation)  were  available,  the  function  and  gradient  evalua¬ 
tion  time  would  be  reduced  to: 

t.  =  175  A  t  +  225  M  t  (5- 2. 3-1*) 

I  e  a  m 

and 

t  =  1U00  A  t  +  1800  M  t  (5. 2. 3-5) 

ge  a  m  " 

By  substituting  eqns.  (5-2. 3-1*)  and  (5. 2. 3-5)  into  eqn.  (1.2. 1-5),  the 

estimated  execution  time  for  one  iteration  of  the  PVM  algorithm  would 

be  reduced  to  0.0036  seconds. 

By  using  the  timing  equations  derived  in  Section  1* .2 ,  and 
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the  procedure  described  above,  the  execution  time  of  many  other  algo¬ 
rithms  may  be  estimated.  To  provide  a  basis  for  comparison,  this  was 
done  for  both  direct  and  indirect  SAP  estimation  procedures  and  the 
results  are  shown  in  Tables  5-8  and  5*9* 

The  results  indicate: 

•  The  execution  time  per  iteration  can  be  significantly  reduced  if 
a  completely  parallel  (i.e.,  parallel  minimization  and  parallel 
integration)  algorithm  is  used. 

•  One  iteration  of  the  nongradient  algorithms  require  less  time  to 
execute  compared  with  the  gradient-dependent  methods. 

•  One  iteration  of  the  indirect  method  requires  much  more  time  and 
processors  to  execute  compared  with  the  direct  method  (see  Tables 
5.8  and  5-9) • 

Finally,  the  speed-up  due  to  parallelism  is  illustrated  in 
Figure  5.10  which  illustrates  the  speeu-up/iteratiou  as  a  function  of 
processors . 

5 . 3  Evaluation  of  Controller  Performance 

The  performance  of  the  parallel  nonlinear  control  algo¬ 
rithms  presented  in  Section  2.2  was  evaluated  by  designing  a  control 
system  for  controlling  the  longitudinal  motion  of  an  F-8  Crusader 
aircraft.  The  longitudinal  equations  of  motion  of  the  F-8  aircraft 
were  obtained  from  the  aircraft  model  shown  in  Figure  5- 11.  The 
aircraft  model  illustrates  the  forces  which  were  considered  and  the 
coordinate  system  used  by  Garrard  and  Jordan  in  reference  [1*3]. 
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TARIF  5-8:  T-33  AIRCRAFT 
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FIGURE  5.10:  T-33  Aircraft:  Indirect  SAP  Estimation  Speed  Up 


It  is  assumed  that  the  aerodynamic  drag  is  negligible 
and  that  the  lift  may  be  separated  into  its  wing  and  tail  components. 
Under  these  conditions,  it  can  be  shown  (cf.  [^3])  that  the  longi¬ 
tudinal  equations  of  motion  become: 


m(u  +  w  0)  =  -mg 

sin  0  +  L 

w 

sin  a  +  L  >in  a 

X  X 

(5-3-1) 

m(w  +  u  0)  =  mg 

cos  0  -  L 

w 

cos  a  -  L  cos  a 

(5.3-2) 

I  0  =  M  +  i  L 

cos  a  -  i. 

L  cos  a  -  c  © 

(5.3-3) 

y  w  w 

t 

t  t 

where 

m  =  mass  of  aircraft 

u  =  velocity  of  aircraft  in  X  direction 
w  =  velocity  of  aircraft  in  Z  direction 

0  =  angular  displacement  about  Y  axis,  measured  clockwise 

from  the  horizon  as  shown  in  Figure  5- 11 

I  =  moment  of  inertia  of  aircraft  about  Y  axis 

y 

~  wing  lift 

=  tail  lift 

ot  =  wing  angle  of  attack 

=  tail  angle  of  attack 

M  =  wing  moment 

w 

1  =  distance  between  wing  aerodynamic  center  and  aircraft 

center  of  gravity 

A  =  distance  between  tail  aerodynamic  center  and  aircraft 
center  of  gravity 

c0  =  damping  moment . 

In  reference  [  ^  ,  Garrard  and  Jordan  reduce  eqns.  (5 .3-1)  - 
(5.3-3)  to  a  nonlinear  dynamical  model  in  which  cubic  and  lower  order 
terms  are  retained  for  the  F-8  aircraft.  This  model  of  the  F-8  aircraft 


which  is  given  below  can  be  used  to  study  the  effect  disturbances  have 
on  the  F-8  aircraft  when  it  is  perturbed  from  level,  unaccelerated 
flight  at  Mach  =  0.85  and  an  altitude  of  30,000  feet. 


For  small  angle  of  attack  disturbances  (a  <  23-5°  =  0.1*1 
radians),  the  F-8  aircraft  model  is  given  by: 

^  =  (1  -  x2  -  0.088  x±)  x3  -  0.877  Xx  +  0.1*7  x2 
+  3-81*6  x3  -  0.215  u  +  0.28  u  x2  +  0.1*7  u2  x 
+  0.63  u3  -  0.019  x2  '  (5-3-M 

x2  =  x3  (5-3-5) 

x3  =  -0.396  x3  -  U.208  x±  -  0.1*7  x2  -  3-561*  x3 

-  20.967  u  +  6.265  u  x2  U6  u2  x,  +61.I*  u3  (5-3-6) 

±  1 

while  for  large  angle  of  attack  disturbances  (a  23 - 5°  =  0 . 1*1 
radians)  the  F-8  aircraft  model  becomes: 

xx  =  (1  -  x2  -  0.088  x±)  x3  -  0.019  x2  -  0.053  x2 
+  0.006  x2  +  0.01*9  x3  -  0.215  u  +  0.28  u 
+  0.1*7  u2  x  +  0.63  u3  (5.3—7) 

x2  =  x3  (5-3-8) 

x3  =  -0.396  x3  -  5-116  x1  -  0.01*2  x2  -  0.32  x3 

-  20.967  u  +  6.265  u  x2  +  1*6  u2  x^  +  6l.l*  u3  (5-3-9) 

The  state  variables,  xn  ,  x^,  and  x3>  represent  the  angle 
of  attack,  pitch  angle,  and  pitch  rate,  respectively.  The  tail  de¬ 
flection  angle,  u,  is  the  control  variable  which  must  be  designed  to 
reduce  an  angle  of  attack  distrubance  as  rapidly  as  possible. 

In  the  remainder  of  this  section,  an  open  loop  and  a  closed 
loop  controller  will  be  designed  and  evaluated  using  the  procedures 

I 

1 


discussed  in  Chapter  Two.  Also,  the  computation  time  required  by  the 


control  synthesis  algorithms  will  be  estimated. 

5.3-1  Open  and  Closed  loop  Control  Synthesis 

In  this  section,  an  open  loop  controller  is  designed  by 
solving  a  nonlinear  two-point  boundary  value  problem  (NTPBVP)  asso¬ 
ciated  with  the  F-8  aircraft's  longitudinal  equations  of  motion.  Also 
in  this  section,  two  closed  loop  controllers  are  designed.  The  first 
using  linear  quadratic  regulator  (LQR)  theory  and  the  second  using  the 
direct  gain  optimization  algorithm  described  in  Section  2.2.2.  The 
controllers  are  designed  assuming  the  angle  of  attack  remains  below 
23-5°  so  that  the  low  angle  of  attack  model  given  by  eqns-  (5-3-1), 
(5-3-5)  and  (5-3-6)  could  be  used. 

Linear  Controller  Synthesis 

To  use  IQS  theory  to  design  a  controller  for  the  F-8  air¬ 
craft,  the  equations  of  motion  must  be  linearized.  Linearizing  eqns- 
(5-3-1) ,  (5-3-5),  and  (5-3-6)  results  in  the  following  linear  model 
of  the  F-8  aircraft: 


-0.877 

0. 

1.0 

-0.215 

x(t)  = 

0. 

C. 

1.0 

x  ( t )  + 

0. 

-1.208 

0. 

-0.396 

-20.976 

which  is  of  the  form 


x(t)  =  A  x ( t )  +  b  u(t)  ( 5 - 3-1-2) 

The  ' —  imai  control  must  be  determined  to  minimize  the  quadratic  per- 

■  — -  index 


x't )  +  r  u(t)  j  dt 


(5-3- 1-3) 


i 


subject  to  the  dynamic  constraint  given  by  ear..  ( 5  •  3 •  1-2 ) .  From 

L(JR  theory,  it  is  well  known  that  the  optimal  control  is  given  by  [  UU] 

u(t)  =  -r'1  bT  P  x(t)  (5-3.1->») 

where  P  is  the  positive  definite  solution  of  the  steady  state  matrix 
Riccati  equation 

ATP  +  PA  -  Pbr-1  bTP  +  Q  =  0.  (5- 3-1-5) 

The  Q  matrix  and  the  scalar  r  were  selected  as 


since  this  choice  of  Q  and  r  gave  good  response  without  exceeding  a 
maximum  tail  deflection  of  25°  (0.^365  radians)  and  a  tail  deflection 
rate  of  60°/sec  (l.o*+72  radians/sec).  The  optimal  control  problem 
above  was  solved  using  ORACLE  -  a  collection  cf  optimal  regulator  algo¬ 
rithms  fcr  the  control  of  linear  systems  [L5]. 

The  resultant  control  law  was  determined  to  be: 

u(t)  =  -0.053  x  (t)  +  0.5  x  (t)  +  C.521  x  (t)  t  >  0 

(5- 3-1-7) 

To  determine  if  a  "better"  controller  could  be  designed  by 
utilizing  the  nonlinear  equations  of  motion  of  the  F-8  aircraft  di¬ 
rectly,  the  direct  gain  optimization  procedure  discussed  in  Section 
2.2.2  was  employed.  Since  we  want  the  controller  to  be  linear  and 
utilize  feedback,  the  controller  is  constrained  to  be  of  the  form: 

u(t)  =  K1  x1(t)  +  K2  x2(t)  +  x^ft)  t  >_  0  (5 -3-1-8) 
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where 


,  K^,  and  are  constant  gains  which  must  he  determined. 


The  optimization  problem,  in  this  case,  is  to  find  the 
control  gains  which  minimize  the  performance  index: 


J  =  h 


L 


[xT(t)  Q  x(t)  +  r  u^(t)]  dt 


(5. 3- 1-9) 


subject  to  the  nonlinear  equations  of  motion  given  by  eqns.  (5-3-M* 
(5-3-5),  and  (5-3-6).  The  matrix  Q  and  the  scalar  r  were  selected  to 
be  the  same  as  in  the  LQR  design. 

At  this  point,  the  direct  gain  optimization  procedure  could 
be  used  to  find  the  optimal  values  of  ,  K^,  and  needed  to  define 
the  control.  Initially,  ,  K^,  and  were  set  to  zero  and  updated 
by  the  Davidon-Fletcher-Powell  (DFP)  method  until  the  performance  index 
(5. 3. 1-9)  was  minimized.  After  twenty  iterations  of  the  direct  gain 
optimization  procedure,  the  gains  had  converged  to  their  optimal  values 
which  when  substituted  into  eqn  (5. 3 -1-6)  yields: 

u(t)  =  0.1368  x.  (t)  +  0.1331  Xg(t)  +  0.6797  x^(t)  t  >_  0 

(5.3.1-10) 


Nonlinear  Controller  Synthesis 

To  determine  how  well  the  linear  feedback  controllers  de¬ 
fined  by  eqns [  5 • 3- 1-8)  and  (5. 3.1-10)  approximate  the  optimal  control, 
the  calculus  of  variations  approach  was  considered.  In  this  case,  the 
problem  is  to  find  an  open  loop  control  which  minimizes  eqn. (5- 3-1-9) 
subject  to  the  satisfaction  of  eqns. (5- 3-M  —  (5-3-6).  Recall  that, 
when  the  calculus  of  variations  is  used  to  solve  an  optimal  control 
problem,  it  is  necessary  to  solve  a  nonlinear  two-point  boundary  value 
problem  (NTPBVP).  In  this  case,  the  NTPBVP  which  must  be  solved  is 
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easily  shown  to  be: 
State  Equations: 


*1  =  (1  "  X1  ‘  0,088  Xl)  X3  ~  0,877  X1  +  0,U7  X1 
+  3 - 8U6  x8  -  0.215  u  +  0.28  u  x2  +  0.1*7  u2  x1 

+  0.63  u3  -  0.019  x2 


X^  =  X, 


(5.: 

(5.: 


x,  =  -0.396  x  -  u . 208  x1  -  0.1*7  X2  -  3. 561*  x3 


-  20.967  u  +  6.265  u  x2  +  16.  U2  X-,  +  6l.l*  (5-! 


Costate  Equations: 


Xx  =  -0.25  x1  +  \x  (2.x1  x3  +  0.088  x^  +  0.877 
-  0.91*  Xx  -  11.536  X2  -  0.56  uxi  -  0.1*7  U2) 

+  A3  { 1*.208  +  0.56  x,  +  10.692  xf  -  12.53  u  xx 


+  1*6  .’  d) 


\2  =  -C.25  x2  +  0.038  A1  x2 


\  =  -0.25  x^  -  X1  (1  -  x,  -  C.O88  xx) 


-  A, 


+  0.396  A. 

Boundary  Conditions : 

x(  0 ) 


Let  the  Hamiltonian  be  defined  as: 

H(x,  u.  A,  t)  =  \  |o.25(x2  +  x2  +  x2)  +  u  J+  Ax  x^ 

+  X2  *2  +  S  *3 


(5. 

(5. 

(5. 


0.31*9 

0 

0 

A  (0)  = 

0 

0 

0 

(5- 


Then  the  optimal  control  must  satisfy  the  necessary  condition: 


5.1- 11) 

3.1- 12) 

3.1- 13) 

3.1- lM 

3.1- 15) 

3.1- 16) 

3.1-17) 
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Hu  =  (1.89  A1  +  181.2  A  )  u2  +  (l.o  +  0.91*  ^  ^ 

+  92.  x1  A  )  u  ♦  A1  (0.26  x2  -  0.215)  +  *3  (6.265  x2 
-  20.967)  =  0  (5-3.1-18) 

If  we  let 

A  =  1.89  A1  +  181*. 2  A3 
B  =  1-0  +  0.9k  x1  X1  +  92.  xn  X3 

and 

C  =  ^  (0.28  x2  -  0.215)  +  X3  (6.265  x2  -  20.967), 

then  the  necessary  condition  becomes: 

2 

Au  +  Bu  +  C  =  0 


which  implies  that  the  optimal  control,  u,  is  given  by: 

_  -S  +  /b2  -  I  AC 
2  A 


(5.3-1-19) 


From  the  optimal  control  theory,  it  is  well  known  that  a 
sufficient  condition  for  optimality  is  K  >  0.  Thus, 

K  =2Au+B>0  (5-3.1-20) 

uu 

Substituting  eqn. ( 5 . 3 . 1-19 )  into  eqn.  (5-3.1-20),  we  have 
2Au  +  E  =  +  /B2-1*C 

which  is  positive  only  if  the  positive  square  root  is  used.  Therefore, 
the  optimal  control  is  given  by: 

-B  +  /b2  -  1*  AC 


Note  that  the  optimal  control  is  relatively  complex  to  imple¬ 
ment  due  to  the  square  root  operation  required.  Furthermore,  to  imple¬ 
ment  this  controller,  the  solution  to  the  NTPBVP  must  be  known  a  priori 
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in  order  to  evaluate  A,  B  and  C.  Observe  that  this  is  not  an  easy 

task  because  as  the  initial  costate  is  adjusted  from  iteration  to  iter- 
2 

ation,  the  term  B  -  1 AC  may  become  negative,  in  which  case  the 

optimal  control  is  undefined.  Unfortunately,  this  occurred  when 

numerical  methods  were  applied  to  this  problem. 

One  way  to  overcome  this  difficulty  is  to  use  the  approach 

taken  by  Garrard  and  Jordon  [I3]  who  approximated  eqns .  ( 5 •  3-1 ) , 

3  2 

(5-3-5),  and  (5-3-6)  by  eliminating  the  u  ,  u  and  cross  product  terms. 
In  this  case,  the  approximate  equations  of  motion  which  are  valid  for 
low  angles  of  attack  are  simply: 


X  =  (1  -  x^  -  0.088  x±)  xq  -  0.8T7  xn  +  0.17  x^ 
+  3-816  X3  -  0.019  -  0.215  u 


(5-3-1-22) 


x2  =  X3 


(5-3-1-23) 


=  -0.396  x3  -  1.208  xx  -  0.17  x^  -  3-561  x^ 


-  20.967  u 


(5-3.1-21) 


If  this  model  of  the  F-8  aircraft  is  employed  in  the  design  of  the 
open  loop  controller,  the  costate  equations  and  control  are  given  by: 


Costate  Equations: 


X1  =  -0.25  x]_  +  X^^  (2-xn  x ^  +  0.088  x ^  +  0.877  -  0.91  xx 
-  11.538  x^)  +  X3  (1.208  +  0.56  x1  +  10.692  x^ 

(5.3.1-25) 


X2  =  -0.25  x2  +  0.038  x2 


(5.3.1-26) 


X3  =  -0.25  x3  -  X1  (1  -  x^  -  0.088  xx)  -  X2  +  0.396  X3 


(5.3.1-27) 


(5-3.1-28) 


Optimal  Control: 

u  =  0.215  A1  +  20.967 

Note  that  in  this  case  the  optimal  control  (for  the  approxi¬ 
mate  F-8  model)  is  extremely  simple  to  compute  provided  X^  and  X^  are 
known.  These  quantities  were  obtained  rather  easily  by  solving  the 
NTPBVP  represented  by  eqns. (5-3.1-22)  -  (5.3.1-28).  This  was  achieved 

by  incorporating  the  serial  DFP  minimization  algorithm  and  the  APC 

integration  method  into  the  indirect  control  algorithm  described  in  Sec- 
* 

jtion  2. 2. 2.1.  The  resulting  solution  is,  of  course,  optimal  for  the 
approximate  F-8  model  and  is  shown  along  with  the  LQR  control  and 
closed  loop  control  designed  using  the  direct  gain  optimization  proce¬ 
dure  in  Figures  5-12  -  5.15.  The  trajectories  displayed  in  Figures 

5.12  -  5-15  indicate: 

•  The  response  of  the  F-8  aircraft  due  to  the  LQR  control  is  signifi¬ 
cantly  different  from  that  due  to  the  open  and  closed  loop  controls. 
This  may  be  attributed  to  using  a  linearized  model  of  the  F-8 
aircraft  in  the  design  process. 

•  The  response  of  the  F-8  aircraft  due  to  the  closed  loop  control 
designed  using  the  low  angle  of  attack  model  compares  very  favorably 
with  that  due  to  the  open  loop  control  which  is  optimal  for  the 
approximate  F-8  model. 

The  second  result,  along  with  the  fact  that  the  closed  loop 
control  utilizes  feedback  while  the  optimal  control  does  not,  indicates 
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Parallel  algorithms  were  not  considered  here  because  only  the  effec¬ 
tiveness  of  the  resultant  control  was  being  studied  not  the  method 
used  to  obtain  it. 


FIGURE  5-13:  F-8  Aircraft:  Angle  of  Attack  Trajectories 


FIGURE  5.15:  F-8  Aircraft:  Pitch  Rate  Trajectories 


that  the  closed  loop  control  may  be  preferable  in  this  case.  Finally, 
it  should  be  emphasized  that  the  closed  loop  control  was  designed  with¬ 
out  approximating  the  nonlinear  equations  of  motion  of  the  F-8  aircraft. 
This  attribute  of  the  direct  gain  optimization  procedure  is  studied 
further  in  the  next  section  in  which  a  number  of  feedback  controllers 
are  compared. 

5-3.2  Feedback  Control  Laws 

In  this  section,  several  feedback  controllers  are  designed 
and  compared  at  high  angles  of  attack.  The  problem  of  interest  this 
time  is  to  determine  the  feedback  control  law  u(t)  =  g  (x(t))  which 
minimizes 


/5  t  o 

[x  Q  x  +  r  u^]  dt 

A 


(5. 3. 2-1) 


subject  to  the  F-8  aircraft  equations  of  motion  given  by  eqns-  (5-3-k) 
(5-3-9).  In  this  example,  the  initial  state  was  xq  =  (0.575  C  0) 
and  the  matrix  Q  and  the  scalar  r  were  selected  as: 


0.25 

0. 

0. 

0  . 

0.25 

0  . 

and 

r  =  1.0 

0  . 

0 . 

0.25 

Note  that  both  high  and  low  angle  of  attack  models  are  used  in  this 
case . 

The  control  problem  above  was  originally  considered  by 
Garrard  and  Jordan  who  used  LQR  and  perturbation  theory  to  design  the 
following  flight  controllers  [1*3]: 


•  Linear  Control 

u  =  -0.053  +  0.5  x^  +  0.521 


(5- 3. 2-2) 


•  Quadratic  Control 


u  =  -0.053  x1  +  0.5  x2  +  0.521  x3  +  0.04 

-  0.048  x1  x2  (5- 3-2-3) 

•  Cubic  Control 

u  =  -0.053  x1  +  0.5  x2  +  0.521  x3  +  0.04  x^ 

-  0.048  x2  +  0.374  x^  -  0.31  x^  x2  (5. 3- 2-4) 

Because  these  designs  do  not  account  for  the  quadratic  and  cubic  con¬ 
trol  terms,  as  veil  as  the  cross  terms  involving  x  and  u  appearing  in 
eqns. ( 5 • 3-4 ) ,  (5-3-6),  (5-3-7),  and  (5-3-9) »  it  seems  plausible  that  a 
better  controller  might  be  designed.  To  show  this,  suppose  the  con¬ 
troller  is  constrained  to  be  of  the  form: 

u(t)  =  K1  xx(t)  +  Kg  x2(t)  +  K3  x3(t)  t>0 

(5- 3- 2-5) 

If  the  direct  gain  optimization  procedure  described  in 
Section  2 . 2 . 3  is  employed ,  then  the  optimal  gains  (K^,  Kg,  and  K3)  could 
be  found  by  initially  setting  them  to  zero  and  updating  the  values  of 
K.  ,  Kg,  and  K,  using  an  iterative  scheme  until  the  performance  index 
(eqn.  5-3-2-1)  is  minimized.  To  speed  computations,  the  parallel 
integration  methods  discussed  in  Section  3-2  may  be  used  to  integrate 
eqns. (5-3-4)  -  ( 5 - 3—9 ) »  while  the  selection  of  the  next  value  of  K^ , 

Kg,  and  K3  may  be  made  using  one  of  the  parallel  minimization  methods 
described  in  Section  3.1.  Because  the  parallel  numerical  procedures 
can  account  for  all  the  nonlinearities  in  eqns.  (5.3-4),  (5-3-6), 
(5-3-7),  and  (5-3-9) >  the  optimized  control  lav: 

u(t)  =  0.138  x1(t)  +  0.385  Xg(t)  +  0.243  x  (t)  t  >_  0 

(5- 3- 2-6) 
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should  be  superior  to  the  controllers  iefined  by  eqr.r  .  '  5  •  3  •  2-2 ) , 
and  (5-3 .2-'+). 

To  determine  it  this  was  indeed  true,  &  siir.ula.tion  of  the 
response  of  the  F-S  aircraft  to  each  feedback  control  law  wras  con¬ 
ducted  and  the  resulting  trajectories  were  olotted  (see  Figures  5-16 
5- 19)-  Since  the  objective  was  to  reduce  an  angle  of  attack  dis¬ 
turbance  to  the  origin  as  rapidly  as  possible,  it  is  clear  that  the 
controller  given  by  eqn.  (  5  •  3  •  2-6 }  is  ir.deeu  superior  to  the  ethers. 

Also  of  interest  was  a  comparison  of  the  performance  of  the 
parallel  and  serial  minimization  methods  discussed  in  Chapter  Three. 

In  Table  5-10,  the  parallel  and  serial  methods  & re  compared  by  count¬ 
ing  the  total  number  of  iterations  required  for  convergence  tc  a  set  of 

* 

control  gains  for  the  F-6  aircraft.  This  is  appropriate  because  the 
parallel  algorithms  require  r.  +  1  gradients  tc  be  evaluated  simulta¬ 
neously  using  r.  +  1  processors  and  a  univariate  search  per  iteration 
while  the  serial  algorithms  require  or.e  gradient  evaluation  and  &  uni¬ 
variate  search  per  iteration.  Hence,  tr.e  computer  time  needed  per 
iteration  by  each  method  will  be  nearly  tr.e  same  provided  the  parallel 
operations  are  done  simultaneously. 

If  we  assume  the  parallel  algorithms  are  executed  on  the 
parallel  computer  discussed  ir.  Section  -.1,  and  that  the  criterion  above 
is  used  to  compare  the  methods,  then  the  results  in  Table  5.10  indicate 
that  the  parallel  algorithms  would  require  significantly  less  time  to 

at  in r  the  gradient  of 
nst-uct  a  direction  of 
n  this  direction. 


One  cycle  of  either  method  consists  cf  evalu- 
eqn.  (5 .3-2-1),  using  this  information  to  cc 
s^8.rch  and  ncnf'cmninc  a  uni vsri 'ft ^  3^3.ycr.  i. 
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08  \-  (radians /soc) 


execute  compared  with  the  serial  algorithms.  'Iso,  the  results  in 


Table  5.10  show  that  the  parallel  methods  used  fewer  function  evalua- 

* 

tions  to  achieve  convergence.  Thus,  it  can  be  concluded  that  the 
parallel  algorithms  can  be  very  effective  in  determining  the  control 
gains  needed  by  flight  control  systems. 

5.3-3  Timing  Consideration 

In  this  section,  the  execution  time  required  for  conver¬ 
gence  to  a  set  of  optimal  control  gains  will  be  estimated  for  the  F-8 
aircraft.  First  this  will  be  done  assuming  a  completely  sequential 
algorithm  is'  executed  on  a  serial  computer.  Secondly,  the  execution 
time  will  be  estimated  for  a  completely  parallel  algorithm  which  is 
assumed  to  be  executed  on  the  parallel  computer  described  in  Section 
4.1.  Finally,  these  two  estimates  will  be  used  to  estimate  the  speed¬ 
up  due  to  parallelism. 

From  the  high  angle  cf  attack  model  of  the  F-8  aircraft, 

it  is  easily  verified  that  n  =  6,  A  ,  =21,  and  M  ,  =32.  If  we 

rhs  rhs 

let  N  =  100,  t  =  200  nsec,  t  =  1000  nsec,  and  L  =  8,  then  for  the 
3  in 

DFP  method  with  APC  integration  the  execution  time  per  iteration  is 
0.1277  seconds  using  eqn.  (b.P. 2-12).  On  the  other  hand,  for  the  PVM 
algorithm  with  PPCl2  integration,  the  execution  time  per  iteration  is 
0.06316  seconds  using  eqt..  (U  ,2.2-lU )  .  In  this  case,  the  speed-up  due 
to  parallelism  is  simply  C. 1277/0. 06316  =  2.02  which  shows  that  one 
iteration  of  the  parallel  algorithms  will  execute  about  twice  as  fast 
as  the  serial  algorithms.  Note  that  a  further  reduction  in  computation 
time  can  be  achieved  if  the  RHS  evaluations  were  performed  by  separate 

_ 

One  function  evaluation  includes  all  arithmetic  operations  needed  to 
evaluate  eqn. ( 5 • 3 . 2-1 ) . 
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processors,  i.e.,  one  processor  for  each  state  variable. 

If  this  is  considered  at  the  expense  of  additional  pro¬ 
cessors,  then  the  execution  time  for  one  iteration  of  the  FVM  algo¬ 
rithm  would  be  only  0.0106  seconds.  This  time  the  speed-up  due  to 
parallelism  is  0.1277/0.0106  =  12.05,  which  is  rather  significant. 

Other  possibilities,  along  with  the  processors  required,  are  shown  in 
Table  5.11. 

To  estimate  the  time  required  for  convergence  to  a  set  of 
optimal  gains,  the  results  shown  in  Table  5.10  and  Table  5-11  can  be 
used.  For  example,  the  execution  time  of  the  DFP/APC  and  PVM/PPCl*2 
algorithms  could  be  estimated  as  follows.  From  Table  5-10,  the  serial 
DFP  algorithm  required  lU  iterations  to  converge.  Using  this  fact, 
and  the  fact  that  the  execution  time  for  one  iteration  of  the  DFP/APC 
algorithm  requires  0.1277  seconds  (see  Table  5- 11)  when  one  processor 
is  available,  the  execution  time  required  for  convergence  is  simply 
lU  x  0.1277  =  1.7878  seconds.  On  the  other  hand,  if  the  PVM/PPCU2  algo¬ 
rithm  is  executed  using  8  processors,  then  che  time  required  for  con¬ 
vergence  is  only  8  x  0.0106  =  0.08U8  seconds  from  the  results  shown 
in  Table  5-10  and  Table  5.11.  Thus,  if  a  completely  parallel  algorithm 
is  executed  on  the  parallel  computer  described  in  Section  U.l,  the 
timing  required  for  control  computations  might  be  rapid  enough  to 
permit  adjustment  of  the  control  gains  in  real  time.  Finally,  the 
advantage  of  using  a  completely  parallel  algorithm  is  further  enforced 
by  computing  the  speed-up  due  to  parallelism  based  on  the  total  time 
required  for  convergence.  From  the  calculations  above,  the  speed-up 
is  1.7878/0.081*8  =  21.08  which  indicates  that  the  parallel  algorithm 
converged  more  than  20  times  faster  than  the  serial  algorithm. 
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5 . U  Evaluation  of  Adaptive  Controller  Performance 

The  purpose  of  this  section  is  to  evaluate  the  performance 
of  the  explicit  adaptive  control  scheme  discussed  in  Section  2.3.  ' 

This  is  accomplished  by  initially  designing  a  feedback  controller  in 
Section  5. b.l  which  will  cause  the  F-8  aircraft  to  follow  a  nominal 
pitch  rate  Command  of  5°/sec  based  upon  a  nominal  set  of  aerodynamic 
parameters.  In  Section  5-1* -2,  the  effectiveness  of  the  parallel 
algorithm  is  demonstrated  by  adjusting  the  feedback  control  gains  on¬ 
line  in  response  to  a  10°/sec  pitch  rate  command.  Finally,  in  Section 
5.1*. 3,  the  feedback  control  gains  will  be  adapted  in  response  to  vari¬ 
ations  in  the  aerodynamic  parameters  of  the  F-8  aircraft  using  a  moving 
window,  explicit  adaptive  control  scheme. 

5.1*.l  Gain  Optimization 

In  this  section,  the  problem  is  to  find  a  feedback  control 
law  which  causes  the  F-8  aircraft  to  follow  a  pitch  rate  command.  The 
pitch  rate  command  considered  in  this  example  was: 

f"50/sec  t  c  [0,  2] 

0C=<  (5.1*.  1-1) 

0  otherwise 

and  the  state  model  (valid  for  low  angles  of  attack)  considered  was: 


x1  =  (1  -  -  0.088  x1)  x3  -  0.877.  x±  +  0.1*7  x^ 

+  3-81*6  x3  -  0.215  u  +  0.28  u  x^  +  0.1*7  u^ 

+  0.63  u3  -  0.019  x^  (5.1*. 1-2) 

x2  =  x3  (5.1*.  1-3) 

x3  =  -0.396  x3  -  1* . 208  x1  -  0.1*7  x^  -  3.56U  x3 

-  20.967  u  +  6.265  u  x^  +  1*6  u^  x^  +  6l.l*  u3  (5.U.1-U) 


\  -  °‘ 
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where  x^(t),  x^(t) ,  x^(t),  and  x^(t)  represent  the  angle  of  attack, 


pitch  angle,  pitch  rate,  and  pitch  rate  command,  respectively. 

The  structure  of  the  controller  is  shown  in  Figure  5-20  and 
the  control  law  to  be  optimized  is: 


u ( t )  =  x1(t)  +  K2  x2(t)  +  x^(t) 

+  G(x^(t)  -  x  (t)) 


(5.1*. 1-5) 


Since  the  objective  is  to  find  the  control  gains  K^,  K2 ,  K^,  and  G 
which  cause  the  pitch  rate  of  the  F-8  aircraft  to  track  the  pitch  rate 
command,  the  following  performance  index  was  specified: 


where 


/5  rn  p 

[x  Q  x  +  r  u  ]  dt 
n 


(5. 1*. 1-6) 


Q  = 


0.25 

0. 

0. 

0. 

0. 

0.25 

0. 

0. 

0. 

0. 

1000.0 

-1000.0 

0. 

0. 

-1000.0 

1000.0 

and  r  =  1.0. 


Initially,  the  unknown  control  gains  were  selected  as  =  -0.1,  = 

-0.001,  K,  =  -0.0k  and  G  =  -0.lt  since  these  values  caused  the  F-8 
aircraft  to  remain  stable  over  the  entire  mission  time  interval  [0,  5]. 

At  this  point,  the  direct  gain  optimization  algorithm  dis¬ 
cussed  in  Section  2.2.3  was  used  to  optimize  the  control  gains  assuming 

T 

the  initial  state  of  the  F-8  aircraft  was  Xq  =(0  0  0  0.087)  .  The  per¬ 
formance  of  the  different  methods  considered  are  shown  in  Table  5.12. 

The  results  indicate: 

•  Each  of  the  minimization  procedures  converged  to  a  set  of  gains 
which  cause  the  tracking  error  (performance  index)  to  be  reduced 
from  J  =  0.63500U  x  10^  initially  to  J  =  0.559612  x  10^  upon  con¬ 
vergence. 
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•  The  PVM  algorithm  required  nearly  half  the  function  and  gradient 
evaluations  to  converge  compared  with  the  serial  DFP  algorithm. 

By  substituting  the  optimized  control  gains  shown  in  Table 
5-12  into  eqn.  ( 5 •  ^ •  1-5 )  >  the  optimized  control  law  can  be  obtained  as  follows 
u  =  -0.128  x±  -  0.009  x2  -  0.0l»6  x^  -  D.U27  (x^  -  x3) 

( 5 -U .1-7 ) 

To  determine  how  well  this  controller  would  cause  the  F-8 
aircraft  to  track  a  5°/sec  pitch  rate  command,  a  simulation  was  per¬ 
formed.  The  pitch  rate  command  and  the  pitch  rate  response  which 
resulted  from  applying  the  optimized  control  law  ( eqn. 5 • ^ • 1-5 )  are 
shown  in  Figure  5.21.  Looking  at  Figure  5-21,  it  is  clear  that  the 
pitch  rate  response  of  the  F-8  aircraft  tracked  the  pitch  rate  com¬ 
mand  relatively  well.  Thus,  our  nominal  design  is  complete. 

5.1*. 2  Adaptive  Gain  Optimization 

In  the  previous  section,  the  controller  given  by  eqn. 

(5-1* -1-5)  is  optimal  only  if  the  initial  state  of  the  F-8  aircraft  is 
at  the  origin  and  a  5°/sec  (0.087  radian/sec)  pitch  rate  command  is 
considered.  If,  however,  the  magnitude  of  the  pitch  rate  command  is 
different  from  5°/sec,  the  gains  will  have  to  be  reoptimized.  Thus, 
it  is  of  interest  to  determine  how  rapidly  the  parallel  algorithms 
can  adjust  the  feedback  gains  in  response  to  a  different  pitch  rate  command. 

To  illustrate  this,  suppose  the  pitch  rate  command  is  10°/sec 
(0.171*  radians/sec),  i.e.,  twice  the  magnitude  of  the  nominal  pitch  rate, 
and  the  initial  state  of  the  F-8  aircraft  is  (0  0  0  0.17L).  The  problem 
is  then  to  reoptimize  the  control,  gains  to  account  for  the  new  pitch 
rate  command. 


< 
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Pitch  Rate  Command 
Pitch  Rate  Response 


(09s  /  suoipoy )  gopd 
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FIGURE  5.21:  Pitch  Rate  Tracking 


This  was  accomplished  by  performing  only  two  iterations  of 
the  direct  gain  optimization  algorithm  which  was  initialized  with  the 
optimal  gains  for  the  nominal  5°/sec  pitch  rate  command  case.  The  re¬ 
sults  are  shown  in  Table  5-13. 

The  results  indicate  that  after  only  two  iterations  of  the ’ 
FVK  algorithm,  convergence  to  the  optimal  set  of  control  gains  is 
possible.  However,  if  the  DFP  algorithm  is  utilized  to  adapt  the  con¬ 
trol  gains,  the  gains  obtained  after  adaptation  were  still  relatively 
far  from  optimal. 

Ir.  view  of  these  results,  and  the  fact  that  the  execution 
time  required  for  one  iteration  of  the  parallel  algorithms  has  been 
shown  to  be  much  shorter  than  the  sequential  methods,  it  appears  that 
the  parallel  algorithms  may  be  applicable  to  update  the  control  gains 
in  real  time.  This  concept  is  pursued  further  in  the  next  section. 


5.^.3  Moving  Window  Adaptive  Gain  Optimization 

As  a  final  topic  in  this  chapter,  an  explicit  adaptive 
controller  will  be  designed  to  stabilize  the  F-8  aircraft  as  the 
aircraft's  center  of  gravity  is  moved  aft  during  flight. 

To  determine  the  point  at  which  the  F-8  aircraft  becomes 
unstable,  the  differential  pitch  rate  was  computed  analytically 
using  the  expression: 

9x„ 


d  *3  =  sF  d£ 


(5.1*. 3-1) 


where  dil  is  an  incremental  change  in  the  distance  between  the  wing 
aerodynamic  center  and  the  aircraft's  center  of  gravity  (see  Figure 


Adaptive  0  Command  Following: 


(-0.1397815  -0.009093014  -0.0li272l491  -0.U292372) 


To  evaluate  eqn.  (5 .  1*.3-1),  the  pitch  rate  equation  below 


was  utilized. 

x,  =  M  /I  +  (C?  +  C,1  X,  -  C2  x3  -  CT°  x2/ 2 

3  W  y  Lw  Lw  1  Lw  1  Lw  1 

-  C 2  x3/2)  q  S/I  -  {c£  +  cj  (x1  -  eQ  -  ae  X;L  +  u) 

w  t  t 

2 

-  C  (x  -  e  -  a  +  u)J  +  a  u} 

l  u  e  e 

•  (1  -  (x1  -  e0  -  ag  x1  +  u)2/2)  q  St  £t/I 

-  c  x3/I  (5.1*. 3-2) 

By  substituting  the  aerodynamic  data  shown  in  Table  5.ll*  into  eqn. 

(5-1*.  3-2)  and  using  the  fact  that  2.  +  £  =  16.889,  it  can  be  shown 

U 

with  some  effort  that 


-g^-  *  5.27  x2  -  22.5  x3  +  1.23  u  -  0.61U  u3  -  0.771*8  u  x2 

-  3.2!*  x2  u2.  (5-1*. 3-3) 

Substituting  eqn.  (5.1*.  3-1)  into  eqn.  (5-3-6),  the  modified  equations 
of  motion  of  the  F-8  aircraft  become: 

xx  =  (1  -  x2  -  0.088  x±)  x3  -  0.877  xx  +  0.1*7  x2 
+  3.81*6  x3  -  0.215  u  -  0.019  x2  +  0.28  u  x2 
+  0.1*7  u2  x  +  0.63  u3  (5-1*. 3-1*) 

x2  =  x3  (5.1*. 3-5) 
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TABLE  5.1^:  F-8  AIRCRAFT  DATA 


Mach 


0.85  Altitude  =  30,000  ft. 


=  CT2  ■  4.0 

L1 

=  C?  =  12.0 

=  o.i 

=  375  ft2  (33-75  m2) 

=  93-4  ft2  (8.41  m2) 

=667.7  slugs  (9773  kg) 

=  0.75 

=  0 

=  0 

=  11.78  ft  (3.53  m) 

=  96,800  slug  ft2  (127,512  kg-m2) 

=  0.189  ft  (0.06  m) 

=  16.7  ft  (5.01  m) 

=  318.0116 
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'1 


x3  =  -0.396  x  3  +  (5.27  65  -  4.208)  x1  -  0.47  x2 

-  3-564  x3  -  20.967  u  +  6.265  u  x2  +  46.  U2  x^ 
+  61.4  U3  +  (1.23  u  -  22.25  X3  -  O.61U  u3 


-  0.7748  u  X2  -  3.24  u2  x  )  d5 


(5. 4. 3-6) 


Note  that  if  dll  =  0,  then  eqns.  (5- 4. 3-4)  -  (5. 4. 3-6)  reduce  to  the 
low  angle  of  attack  model  of  the  F-8  aircraft  given  by  eqns.  (5-3-4) 


(5.3-6). 


By  increasing  dS.  incrementally  from  d£  =  0  to  dll  =  1.5  in 


eqns.  (5- 4. 3-4)  -  (5 -4. 3-6),  the  point  at  which  the  F-8  aircraft 
becomes  unstable  can  be  determined  by  monitoring  the  open  loop  response 
of  the  aircraft  and  determining  when  the  response  doubles  in  amplitude. 
From  the  open  loop  response,  it  was  concluded  that  under  nominal 
conditions  (d£  *  0),  the  aircraft  is  stable.  However,  as  d£  is  increased 
the  aircraft  became  unstable  for  dH  >_  1. 

In  view  of  these  results,  the  remainder  of  this  section  is 
concerned  with  the  design  of  an  explicit  adaptive  controller  which 
will  stabilize  the  F-8  aircraft  as  d£.  is  increased  from  d£  =  0  to  d£  = 
1.5.  Since  the  direct  adaptive  control  algorithm  discussed  in  Section 
2.2.3  must  be  initialized  with  a  set  of  stabilizing  gains,  such  a  set 
of  gains  must  be  determined  a  priori  based  upon  a  set  of  nominal  condi¬ 


tions  . 


As  indicated  earlier,  the  nominal  conditions  for  the  example 


under  consideration  are  d£  =  0  and  a  nominal  initial  state  of  xQ  = 
(0.349  0  0).  If  we  restrict  the  control  to  be  linear  of  the  form: 
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t  >  0  (5. *..3-7) 


u(t)  =  K1  x^t)  +  Kg  Xg(t)  +  x3(t) 


then  the  problem  is  simply  to  find  the  control  gains  K^,  Kg  and 
which  minimize  a  suitably  defined  performance  index  such  as: 


J 


Q  x ( t )  +  r  u2(t) )  dt 


(5.1*. 3-8) 


subject  to  the  F-8  aircraft’s  equations  of  motion  described  by  eqns. 


(5.U.3-U)  _  (5.1 

*.3-6). 

The  Q 

and  r 

0.25 

0. 

0. 

Q  = 

0. 

0.25 

0. 

0. 

0. 

0.25 

and  r  =  1.0 


since  this  choice  of  Q  and  r  gave  good  response  in  previous  examples 
when  dJl  =  0.  Because  the  open  loop  response  of  the  F-8  aircraft  is 
stable  over  the  entire  mission  time  inverval  [0,  5]»  the  control  gains 
were  initially  set  to  zero. 

At  this  point,  the  direct  gain  optimization  procedure  was 
employed  to  optimize  the  control  gains.  The  resulting  control  law 
was  determined  to  be : 


u(t)  =  0.lUl6  x^Ct)  +  0.8036  Xg(t)  +  0.6U88  x^(t) 

t>0  (5.1*.  3-9) 

Now  that  the  nominal  design  is  complete,  the  optimized  control  gains 
K^  =  0.lUl6,  Kg  =  0.8036,  and  =  0.6U88  can  be  used  to  initialize 
the  adaptive  control  algorithm. 

Before  the  direct  adaptive  control  algorithm  described  in 
Section  2.3.2  can  be  utilized,  the  adaptation  times  t^,  tg,  . ..,  t 
must  be  specified  a  priori.  However,  since  the  adaptation  times  are. 
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in  general,  chosen  somewhat  arbitrarily,  uniform  adaptation  intervals 
were  considered.  In  particular,  since  the  duration  of  the  mission  time 
is  only  five  seconds,  the  adaptation  times  were  selected  as  t^  =  1, 
t2  =  2,  t^  =  3,  and  t^  =  h.  Also,  in  the  simulations  it  was  assumed 
that  d£  varies  linearly  from  the  stable  condition  (d£  =  0),  to  the 
unstable  condition  (d£  =  1.5)  as  follows: 

d£(t)  =  3/10  t  V  t  c  [0,  5]  (5.1*. 3-10) 

Because  only  the  effectiveness  of  the  control  update  algo¬ 
rithm  was  being  studied  in  this  example,  it  was  assumed  that  perfect 
estimates  of  d£  were  available  as  needed.  The  adaptive  control  scheme 
was  evaluated  by  performing  one  iteration  of  the  direct  gain  optimiza¬ 
tion  procedure  assuming  the  actual  values  of  d£  were  available  at  the 
adaptation  times. 

To  determine  if  the  parallel  algorithm  could  indeed  optimize 
the  control  gains  more  rapidly  than  serial  methods,  the  PVM  and  DFP 
algorithms  were  considered.  The  results  obtained  are  shown  in  Tables 
5.15  and  5-1 6. 

The  results  indicate  that  PVM  algorithm  could  indeed  reduce 
the  performance  index  more  rapidly  than  the  sequential  DFP  method. 

This  is  more  clearly  revealed  by  summing  the  performance  index  values 
after  adaptation  for  each  method.  For  the  PVM  algorithm,  this  amounts 
to: 

k 

J..  =  0.01*2 

i=0 
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CHAPTER  SIX 


CONCLUDING  REMARKS 


In  Section  1.2,  a  survey  of  existing  parallel  identifica¬ 
tion,  estimation  and  control  algorithms  and  an  evaluation  of  their 
usefulness  was  made  in  terms  of  accuracy,  speed,  processor  require¬ 
ments,  and  numerical  efficiency.  From  this  survey,  it  was  clear  that 
the  major  problems  with  existing  methods  were  the  lack  of  accuracy 
and  excessive  computation  time.  Also,  it  was  revealed  that  parallel¬ 
ism  can  be  employed  to  alleviate  such  problems.  Thus,  the  need  for 
developing  more  efficient  parallel  procedures  based  upon  modern  non¬ 
linear  estimation  and  control  theory  was  established.  This  fact  led 
to  the  development  of  several  identification,  estimation  and  control 
algorithms  which  employ  a  high  degree  of  parallelism  but  at  the  same 
tine  were  not  extravagant  in  the  utilization  of  processing  elements. 
Whereas  most  existing  estimation  and  control  algorithms  had  been 
designed  using  approximate  linearized  equations  of  motion,  the  parallel 
procedures  developed  in  this  thesis  utilize  the  nonlinear  process 
equations  directly. 

The  nonlinear  estimation  and  control  algorithms  developed 
in  this  thesis  employ  parallel  minimization  methods  to  accelerated 
convergence,  parallel  methods  for  integrating  ordinary  differential 
equations  to  facilitate  computations,  and  a  procedure  based  upon  par¬ 
titioning  the  integration  interval  to  improve  accuracy  and  reduce  the 
sensitivity  of  the  overall  'algorithm. 

The  major  contributions  which  resulted  from  investigating 
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each  phase  of  the  nonlinear  estimation  and  control  algorithms  con¬ 
sisted  of: 

•  Developing  a  class  of  parallel  rank-two  quasi-Newton  methods  for 
unconstrained  minimization. 

•  Establishing  a  strategy  for  optimally  selecting  the  number  of 
subintervals  and  mesh  points  associated  with  the  parallel  shooting 
approach  to  solving  nonlinear  two-point  boundary  value  problems. 

•  Developing  a  procedure  which  automatically  adjusts  the  step  size 
of  a  parallel  predictor-corrector  integration  scheme  to  maintain 
a  desired  level  of  accuracy. 

•  Demonstrating  with  representative  examples  that  the  newly  de¬ 
veloped  parallel  algorithms  do  indeed  perform  better  than  existing 
sequential  methods  in  terms  of  speed,  accuracy,  and  reliability. 

•  Applying  the  PQN  method,  PVM  algorithm  and  the  CM  method  to  solving 
dynamic  optimization  problems  (such  as  nonlinear  estimation  and 
control  problems)  rather  than  static  optimization  problems  involv¬ 
ing  algebraic  functions. 

The  remainder  of  this  chapter  is  divided  into  three  sections. 
In  Section  6.1,  some  conclusions  are  drawn  based  upon  the  results 
obtained  as  a  consequence  of  conducting  this  research.  In  Section  6.2, 
some  recommendations  are  made,  and  areas  of  future  research  are  sug¬ 
gested  in  Section  6.3. 
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6.1  Conclusions 


In  this  section,  some  conclusions  are  drawn  based  upon  the 
analytical  and  empirical  results  obtained  in  Chapters  Three,  Four  and 
Five . 

From  the  results  in  Chapter  Three,  it  can  be  concluded  that 
without  question  the  parallel  minimization  algorithms  do  indeed  re¬ 
quire  significantly  fewer  iterations  for  convergence  compared  with 
serial  methods  (see  Tables  3-1-3. 6).  In  fact,  it  was  shown  analyti¬ 
cally  that  if  the  PQN  algorithm  is  utilized  to  minimize  a  quadratic 
function  in  n  variables,  then  convergence  to  the  location  of  the 
minimum  is  guaranteed  in  only  one  iteration  provided  n  +  1  degrees 
of  parallelism  are  employed  (see  Theorem  3-2  and  Table  3-1).  Since 
the  PQN  method  was  generally  more  robust  than  the  PVM  algorithm,  this 
result  suggests  that  parallel  double-rank  methods  might  be  more  robust 
than  parallel  rank-one  methods  (see  Figures  3-1-3. 6). 

From  the  timing  equations  derived  in  Chapter  Four  and  the 
timing  results  in  Chapter  Five,  it  was  revealed  that  one  iteration 
of  the  (parallel  or  serial)  nongradient  algorithms  required  much  less 
time  to  execute  than  did  the  (parallel  or  serial)  gradient-dependent 
methods,  although  more  iterations  of  the  nongradient  methods  were 
usually  required  for  convergence  (see  eqns.  (1+.2.2-11)  -  ( U . 2 . 2— lU ) 
and  Tables  5-8,  5-9>  and  5-H)-  Also  along  these  lines,  it  was  shown 
that  one  iteration  of  the  indirect  control  algorithm  required  signifi¬ 
cantly  more  time  and  processors  to  execute  compared  with  the  direct 
gain  optimization  procedure  (see  Tables  5-8  and  5-9)-  This  observation 
was  also  valid  for  the  indirect  and  direct  SAP  estimation  algorithms 
as  well. 
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From  the  simulations  performed  in  Chapter  Five,  it  can  be 
concluded  that  although  the  gradient  of  a  highly  nonlinear  function 
may  be  difficult  at  best  to  compute  numerically,  the  convergence 
properties  of  the  gradient-dependent  algorithms  were  clearly  prefer¬ 
able  to  the  nongradient  methods  (see  Tables  5.1-5- 7)-  From  the  re¬ 
sults  shown  in  Figure  5-2,  it  was  revealed  that  the  robustness  of  the 
PVM  algorithm  was  enhanced  the  most  when  parallel  methods  rather  than 
serial  methods  were  employed  to  integrate  the  state  and  costate  equa¬ 
tions  associated  with  the  Van  der  Pol  system.  This  result  was  obtained 
using  ordinary  shooting.  However,  when  parallel  shooting  was  con¬ 
sidered,  the  number  of  unknown  boundary  conditions  which  must  be  found 
was  artificially  increased  from  2n  to  n(2N-l)  where  n  is  the  order  of 
the  system  and  N  is  the  number  of  subintervals.  Despite  this  fact, 
as  the  integration  interval  is  partitioned  into  many  subintervals,  the 
sensitivity  of  the  solution  will  be  reduced,  and  in  general,  the 
solution  obtained  will  be  more  accurate.  Unfortunately,  since  a  high 
order  optimization  problem  must  be  solved  (i.e.,  n(2N-l)  unknowns  must 
be  found),  the  number  of  iterations  required  for  convergence  increases 
as  well  (see  Tables  5.6  and  5.7). 

When  the  AMS  algorithm  was  used  to  optimally  select  the 
mesh  points  required  by  the  parallel  shooting  algorithm,  it  was  re¬ 
vealed  that  the  local  truncation  error  could  indeed  be  minimized, 
although  many  iterations  were  required  here  also.  In  fact,  it  was 
shown  that  a  20$  improvement  in  accuracy  was  possible  by  employing 
the  AMS  algorithm  (see  Section  5.1.3). 

From  the  SAP  estimation  results  obtained  in  Section  5.2,  it 
can  be  concluded  that  even  if  poor  estimates  of  the  unknown  initial 
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state  and  parameters  of  the  T-33  aircraft  were  available  initially, 
convergence  to  the  true  initial  state  and  parameters  was  possible  even 
when  the  measurement  data  was  extremely  noisy  (see  Table  5-5). 

From  the  results  obtained  in  Section  5-3,  it  can  be  con¬ 
cluded  that  the  response  of  the  F-8  aircraft  could  be  improved  signifi¬ 
cantly  if  the  control  was  designed  by  employing  the  nonlinear  control 
algorithms  developed  in  Section  2.2.  In  particular,  it  was  revealed 
that  it  was  better  to  design  a  simple  feedback  control  law  using  the 
F-8  aircraft's  nonlinear  equations  of  motion  directly  rather  than 
to  approximate  the  equations  of  motion  and  employ  linear  quadratic 
regulator  (LQR)  theory  or  utilize  a  more  complex  control  law. 

Finally,  it  can  be  concluded  from  the  adaptive  control  re¬ 
sults  obtained  in  Section  5  -  3 »  that  the  direct  gain  optimization  pro¬ 
cedure  might  be  implemented  in  an  on-line  adaptive  type  fashion.  This 
follows  from  the  fact  that  after  only  two  iterations,  the  PVM  algo¬ 
rithm  converged  to  an  optimal  set  of  control  gains  while  after  two 
iterations  of  the  serial  DFP  method,  the  control  gains  remained 
relatively  far  from  optimal  (see  Table  5.13). 

6.2  Recommendations 

In  view  of  the  results  obtained  in  this  thesis,  the  follow¬ 
ing  recommendations  are  in  order. 

•  The  weighting  parameter,  c,  which  defines  a  set  of  basis  vectors 
for  the  PVM,  PBFS,  and  PDFP  methods,  should  be  set  to  c  =  10  ^  since 
this  choice  of  c  gave  the  best  overall  performance  (see  Figures 
3. 1-3. 6). 
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•  In  view  of  the  superior  robustness  characteristics  of  the  PBFS 


method,  it  might  be  considered  rather  than  the  PDFP  and  PVM 
methods ,  although  the  PVM  method  did  converge  faster  than  the  PBFS 
method  in  many  test  cases  (see  Tables  3. 1-3. 6). 


•  If  convergence  problems  are  encountered  when  using  the  PQN  method 
to  solve  nonlinear  estimation  and  control  problems ,  the  following 
modifications  of  the  basic  procedure  are  recommended: 

1.  Update  the  inverse  Hessian,  H^1^,  in  Step  U  of  the 

PQN  algorithm  only  if  d^  y^  >  0  ¥  J  *  I,  2,  n.  As  indicated 

by  Proposition  3-1*,  this  modification  will  guarantee  that  the  inverse 
update  will  be  positive  definite. 

2.  Replace  Step  3  of  the  PQN  method  with  the  following: 
a.  Compute  n  +  1  gradients  of  f(x)  at  n  +  1  distinct 
points  in  parallel: 

g(x^)  and  gj  =  g(x^  +  c  d^  )  J  *  1,  2,  ....  n 


b.  Compute  the  gradient  difference  in  parallel: 

y. 


■j  (gj  -  g(x^))/c 


In  the  above  modification,  c  is  the  same  weighting  parameter 
used  to  define  the  basis  set 

E  =  (0^  d2,  ...,  On)  -  c  In;  c  >  0. 

By  performing  computations  in  this  manner,  the  gradients  required  can 

be  computed  more  reliably  because  the  forward  integration  of  the  state 

and  costate  equations  will  remain  stable.  Note  that  when  f(x)  is 

quadratic,  y  -  (c  A  d  )/c  =  A  d  for  the  modified  version  of  Step  3. 

*  J  J  J 


PQN  method  are  unaffected  by  the  modifications  cited  above. 

In  view  of  the  results  presented  in  Section  3.2.3>  the 
PPCU2V  integration  scheme  is  recommended  for  solving  the  required 
initial-value  problems  (iVP's)  since  the  accuracy  of  the  solution  can 
be  specified  a  priori.  Also,  because  the  PPCU2V  method  has  been  de¬ 
signed  to  execute  on  separate  processors,  the  solution  to  an  IVP  can  be 
obtained  extremely  rapidly. 

With  regard  to  the  nonlinear  state  and  parameter  (SAP)  esti¬ 
mation  algorithms,  the  direct  SAP  estimation  algorithm  should  be  used 
only  if  process  noise  is  omitted  from  the  state  model.  On  the  other 
hand,  if  process  noise  is  included  in  the  state  model,  then  the  indirect 
method  should  be  considered.  If  sensitivity  problems  are  encountered, 
the  parallel  shooting  method  with  adaptive  mesh  selection  has  proven 
to  be  very  effective  in  alleviating  such  problems. 

With  regard  to  the  control  algorithms ,  the  direct  gain 
optimization  procedure  is  highly  recommended  in  view  of  the  fact  that 
near  optimal  response  was  obtained  without  an  excessive  amount  of 
computation  (see  Figures  5.6-5.10).  Also,  this  method  should  be 
seriously  considered  because  the  equations  of  motion  of  a  highly  non¬ 
linear  system  can  be  utilized  directly  in  the  control  system  design 
process . 

6.3  Areas  f  Future  Research 

In  this  section,  some  areas  of  future  research  are  suggested. 

One  aspect  of  the  PQN  method  which  could  benefit  from  addi¬ 
tional  research  is  the  generation  of  a  set  of  mutually  conjugate 


directions.  In  particular,  alternate  parallel  methods  should  be  con¬ 
sidered  for  solving  the  linear  system  of  equations  required  to  generate 

the  direction  vectors.  Since  each  row  of  the  C  ,  matrix  defined  in 

m-1 

Proposition  3.2  is  known  once  m  has  been  specified,  the  Gaussian 
Elimination  procedure  [22]  might  be  modified  to  solve  the  resultant 
linear  system  in  a  row-wise  fashion.  Of  course,  this  modification 
should  be  amenable  to  parallel  computation . 

Another  area  of  future  research  might  be  the  extension  of  the 
parallel  variable  step  size  integration  method  derived  in  Section  3*2.2 
such  that  the  order  of  the  method,  as  well  as  the  step  size,  can  be  auto¬ 
matically  adjusted  to  maintain  a  desired  level  of  accuracy.  This  con¬ 
cept  was  initially  investigated  by  C.  W.  Gear  in  reference  [U6]  although 
Gear's  work  was  concerned  with  purely  sequential  methods  at  that  time. 

With  regard  to  the  parallel  computer  described  in  Section 
U.l,  future  research  should  be  conducted  in  the  following  areas: 

•  Specifying  processor  add,  multiply  and  transfer  times  to  permit 
real  time  estimation  and  control. 

•  Estimating  memory  size  and  peripheral  requirements. 

•  Studying  the  effects  of  words ize. 

•  Analytically  modeling  the  reliability  of  the  proposed  design  and 
studying  the  effects  of  component  failures  (such  as  one  of  the  pro¬ 
cessing  elements). 

•  Determining  the  feasibility  of  implementation  and  cost. 
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Another  possibility  might  be  to  develop  a  parallel  nonlinea: 


estimation  and  control  algorithm  based  upon  Hamilton-Jacobi-Bellman 
(HJB)  theory  [1*7].  To  illustrate  hov  the  control  algorithm  might  be 
arranged,  consider  the  optimal  control  problem: 

min  V(x  ,  t  )  =  <J>(x(t  ) ,  t  )  +J  L(x,  u,  t)  dt  (6.3-1) 
u  oo  1  1  J  t 

o 

subject  to 

x  =  f ( x ,  u,  t)  t  e  [t  ,  t  ]  (6.3-2) 

o  I 

If  we  assume  that  x(tQ)  =  xq  is  known,  tf  is  specified  and 
x(tf)  is  unspecified,  then  the  HJB  equation  which  must  be  satisfied  is 


simply : 


3V(x,  t) 
3t 


+  L(x,  u,  t)  + 


3V(x,  t) 


n  t 


3x 


f (x ,  u ,  t )  =  0 


(6.3-3) 

The  boundary  condition  associated  with  eqn.  (6.3-2)  is 
V(x(tf),  tf)  =  <Kx(tf),  tf)  (6.3-1*) 


By  defining  the  Hamiltonian  as: 

H(x,  u,  A,  t)  =  L(x,  u,  t)  +  A^(t)  f(x,  u,  t)  (6.3-5) 

then  it  can  be  shown  that  the  adjoint  variable,  A(t),  is  given  by 

A(t)  =  9V(x,  t)/3x.  From  the  maximum  principle,  it  is  well  known  tnat 

the  optimal  controls  must  satisfy  the  necessary  condition  9H/0U  =  0. 

If  this  condition  can  be  solved  explicitly  for  u(t),  the  control  will 
be  of  the  form: 

u(t)  =  h[x(t) ,  A (t ) ,  t]  (6.3-6) 

But  since  A(t)  =  3V(x,  t)/3x,  the  optimal  control  is: 

u(t)  =  h[x(t),  3V(x,  t)/3x,  t]  (6.3-7) 


By  substituting  eqn.  (6.3-7)  into  eqn.  (6.3-3)*  the  result 


is : 


^at  ~  +  L^x>  h^-x’  fx’ 


+ 


l 


9V(x,  t) 


n  t 


3x 


f(X,  t)  =  0 


(6.3-8) 


The  problem  then  is  to  find  the  continuous  function  V(x,  t) 
which  satisfies  eqn.  (6.3-8)  and  the  boundary  condition  (6.3-4)  subject 
to  the  dynamic  constraint  (6.3-2). 

In  view  of  the  above  problem  formulation,  an  appropriate 
error  function  might  be: 


dt 


(6.3-9) 


where 


3V 

e(t)  =  L(x,  h[x,  gj,  t],  t) 


f(x,  h(x,  |~,  t) ,  t)  +  ||- 


Note  that  if  the  time  functions  V(x,  t)  can  be  found  such 
that  eqn.  (6.3-9)  is  identically  equal  to  zero  and  the  constraints 
given  by  eqn.  (6.3-2)  and  eqn.  (6.3-4)  are  satisfied,  then  we  would 
have  a  solution  to  the  original  optimal  control  problem  For  computa¬ 
tion  reasons,  however,  V(x,  t)  is  usually  approximated  by  a  power 
series  of  the  form: 
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■  t}  =  Z  °J  XJ  *  sT  Z  2  CJ*  XJ  ** 
J=1  j=l  k=l 


n  n  n 


jr  Z  I!  X  cm  xi  ‘i\* 

i=l  j=l  k*l 


(6.3-10) 


where  the  c's  are  time  functions  which  must  he  determined.  However, 
because  the  c's  are  functions  of  time,  the  problem  at  hand  is  more 
difficult  that  it  appears.  One  way  to  overcome  this  difficulty  is  to 
approximate  the  c's  using  a  Taylor  series  as  follows: 

c ( t )  =  d  +  d  (t  -  t  )  +  h  djt  -  t  )2  +  0((t  -  t  )3) 

0  1  o  2  o  o 


(6.3-11) 


where  the  d's  are  constants  which  must  be  determined. 


Thus ,  the  problem  has  been  converted  to  one  of  finding  a  set 
of  constants  rather  than  time  varying  unknowns.  Since  we  are  now 
confronted  with  solving  a  finite-dimensional  minimization  problem,  the 
parallel  minimization  algorithms  discussed  in  Section  3-1  can  be  used 
to  optimize  the  d's  in  eqn.  (6.3-11).  Also,  the  parallel  integration 
methods  described  in  Section  3-2  may  be  used  to  integrate  eqn.  (6.3-2) 
which  is  necessary  to  evaluate  the  error  function  (6.3-9)* 

On  the  basis  of  the  results  obtained  in  this  thesis,  it  is 
felt  that  the  parallel  Hamilton-Jacobi-Bellman  (PHJB)  method  outlined 
above  should  also  benefit  a  great  deal  from  the  use  of  parallelism. 
Since  the  control  gains  obtained  by  this  method  are,  in  general,  time 
varying ,  the  PHJB  method  should  provide  better  control  than  the  direct 
gain  optimization  procedure  presented  in  Section  2.2.3.  Of  interest 
then,  would  be  a  comparison  of  the  response  of  a  given  system  due  to 
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the  control  laws  designed  by  each  method  along  with  the  number  of  pro¬ 
cessors  required  to  implement  each  procedure.  Using  this  information, 
a  trade-off  could  then  be  made  between  the  number  of  processors  and 
the  response  of  a  given  system. 

Finally,  it  is  hoped  that  these  remarks  and  the  encouraging 
results  obtained  in  this  thesis  motivate  future  research  in  this  area. 
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APPENDIX 


PARALLEL  MINIMIZATION- PROCEDURES 

In  this  appendix,  the  Chazan-Miranker  method,  the  parallel 
variable  metric  (PVM)  algorithm  due  to  Straeter,  and  the  parallel 

Jacobson- Oksman  (p JO)  procedure  reported  by  Straeter  and  Markos  are 
given  for  reference.  These  methods  are  useful  in  minimizing  a  func¬ 
tion  f:  ■+  ^  which  is  assumed  to  be  continuous  and  differentiable 

in  each  variable.  The  gradient  of  f  will  be  denoted  as  the  function 
g:  R0  -+  With  these  preliminary  remarks,  the  parallel  minimization 

methods  can  be  presented  formally  as  follows: 

Chazan-Miranker  Procedure 

Let  l  represent  the  iteration  number  and  define  the  follow¬ 
ing  quantities: 

•  UP  A  {up^,  up2,  up^)  =  a  set  of  n  linearly  independent  unit 

vectors . 

•  &=1,  2,  . ..,  A  a  sequence  of  positive  scalars  tending  to 

zero. 

•  WP”  A  up^  if  Jt  A  i  mod  r.  where  i  =1,  2,  n 

•  PT^ ,  L  =  1,  2,  . ..,  A  a  sequence  of  n  vectors 

•  vj  ,  j  *  1,  2 . .  H  *  1,  2,  ...,  A  a  sequence  of  n  vectors 

called  search  direction  vectors. 

Then  the  value  of  x  which  minimizes  f(x)  may  be  obtained  by  performing 
the  following  steps: 
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Step  1 : 


Determine  the  scalars ,  a^+1 ,  by  performing  simultaneously 


the  univariate  minimizations 


where 


min  f(vA  +  a^+1  s£+1)  j  *  1,  2,  ....  n 
x+1 


si+i  =  IvjwJ 


Step  2: 


Step  3: 


Update  the  n  vector,  PT,  such  that: 

PT1  =  PT1  +  (1  +  a1  )  v1  /liv1  I 
£+1  l  v  £+1 '  i+l' 11  £+1 1 


Compute  f(PT^+1)  and  terminate  the  algorithm  if: 
|f(PT^+1)  -  f (PTj) | 

is  sufficiently  small;  otherwise,  continue  to  Step  l; . 


Step  U : 


Step  5 : 


Update  the  search  direction  n  vectors,  such  that: 

VJ  =  (cx*5+^  -  )  v^  /I  |v^  I  I  +  v^+^ 

£+j+l  va£+l  ®£+l '  £+l' 1  1  £+1 1  1  £+j 

j  =  1,  2,  ...,  (n-1 ) 


Update  the  n  search  direction  vector  by  selecting  one  of 


the  linearly  independent  unit  vectors  from  UP  as  follows: 

v11  =6  WPn 

i+n+l  p£+n+l  £+1 
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where 

WP^+^  is  chosen  cyclically  from  the  set  UP. 

Set  A  «-  A  +  1  and  return  to  Step  1. 

Parallel  Variable  Metric  Algorithm 

Let  A  denote  the  iteration  number  and  define: 

•  I  A  {o^,  a  ,  . . . ,  o  }  as  a  set  of  n  linearly  independent  vectors. 

•  VQ  as  any  positive  definite  nxn  matrix;  typically  =  1^. 

Then  the  value  of  x  which  minimizes  f(x)  may  be  obtained  by  performing 
the  following  steps : 


Step  1 : 

a.  Evaluate  the  function  and  its  gradient  at  n  distinct  points  simul¬ 
taneously  in  parallel. 


f(x£  +  <Jj)  and  gj  *  g(*£  +  0^ ) 


V  J  -  1,  2, 


b.  Compute  g  g  and  terminate  the  algorithm  if 
J  J 


T 

gJ  gj 


j  =  l,  2,  . . .  ,  n 
is  sufficiently  small;  otherwise,  continue  to  Step  2. 


Step  2  : 

a.  Compute  y^  A  g^  -  g(x^)  i  =  1*  2,  ...»  n 


b.  Compute  the  residual  vectors 

J-l  T 

r±ll  r 

T  .  rk 


r 

rj  *  Vi  l  -  Z  ; 


J  =  2 ,  . . . ,  n 


k=l  k  Jk 


« 


Tf  the  denominator  in  step  2b  is  zero  for  any  term,  that  term  is 
deleted  from  the  sum. 


where 


rl  ■  VtO.  *1  -  °1 


c. 


Compute  the  n  scalars, 


TJ  ft  < 


_(yT  r 

^yJ  j 


and  modify  the  metric: 

for  y*  r,  *  0 
otherwise 


V  =  V  + 
£  V£-l 


£  n 

I 

J-l 


T 

TJ 


J  ■  1.  2. 


n 


Step  3: 

a.  Determine  the  scalar  a£  by  performing  a  single  univariate  search 

min  f(xt  +  a£  s£) 

where 

s£  A  -  v£  g(x£) 

b.  Update  the  n  vector  x,  such  that 


x£+l  x£  +  a£  S£ 

c.  Compute  f (x£+^ )  and  g(x£+1)  simultaneously  in  parallel  and  termi 
nate  the  algorithm  if 


is  sufficiently  small;  otherwise,  set  £-*-£  +  1  and  return  to 
Step  1. 
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Parallel  Jacobson-Oksman  Procedure 

Let  £  denote  the  iteration  number  and  define: 

Z  A  {a,  ,  ar ,  . . • ,  o  , )  as  a  set  of  n+1  linearly  indepen- 
=  1  d  n+l 

dent  vectors 

Then  the  value  of  x  which  minimizes  f(x)  may  be  obtained  by  performing 
the  following  steps. 


Step  o; 

Let  Xq  be  the  initial  estimate  of  the  minimum  of  f(x) 
and  compute  f(Xp)  and  g{Xg).  Set  £  =  0. 

Step  1: 

Define : 

*  XJ.  *  °J  J  =  1.  2 . 

and  evaluate  f(x  )  and  g(x  )  in  parallel. 

J  J 

Step  2: 

Set  x  „  =  x.  and  solve  the  linear  system: 
n+2  £ 


C  a  =  v 


where 

9f (x^)  i  =  1,  2,  ...»  n+2 

9x ^  j  2>  •  •  •  »  n 

cij  4  |  f(V  1  *  2 . n+2 

i  =  n+l 

-1  i  =  1 ,  2 ,  . . . ,  n+2 

i  =  n+2 
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and 


vj  *  5J 


J  ■  1,  2,  ....  n+2 


Step  3 


Compute  the  search  direction  vector: 


SZ*  B  '  Xl 


and  evaluate  f(0)  and  g(8).  If  ||g(B)||  is  sufficiently  small,  stop. 


If  not,  and  if  f(8)  <  ftx^),  then  set  x^+1  *  £,  i  +  &+ 1,  and  return 


to  Step  1.  Otherwise,  perform  a  line  search: 


min  f(x.  +  A  s.) 
A 


and  set  x^+1  =  +  A  s^,  i,  *■  l+l  and  go  to  Step  1, 
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